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BACKGROUND OF THE INVENTION 



The present application claims the priority of co-pending U.S. Provisional Patent 
Applications, Serial No. 60/173,617, filed December 29, 1999, and Serial No. 
60/1 74,391 , filed January 3, 2000, the entire disclosures of which are incorporated herein 
by reference without disclaimer. The government owns rights in the present invention 
pursuant to grant number CA42557 from National Institutes of Health and CA78862-01 
from the National Cancer Institute. 



1 . Field of the Invention 

The present invention relates generally to the field of genome-wide gene analysis. 
More particularly, it concerns the development of a technique wherein longer sequences 
extended from SAGE tags are generated to analyze gene expression. Furthermore, it 
concerns the development of a technique wherein extended DNA sequences encoding 
parts nf an isolated protein fragment are generated to identify genes encoding isolated 
proteins. The invention also provides a high-throughput method for identifying genes 
encoded by SAGE tags. 



2 . Description of Related Art 

A particular biological event in a cell is largely controlled by the expression of 
multiple genes, both at the correct time and in a spatially appropriate manner. 
Monitoring the pattern of gene expression under various physiological and pathological 
conditions is a critical step in understanding these biological processes and for potential 
intervention. Because of the large number of genes expressed in higher eukaryotic 
genomes, powerful tools are needed to characterize the overall pattern of gene 
expression. The successful development of the SAGE technique (Serial Analysis of 
Gene Expression) is an important milestone in this regard (Velculescu et al, 1995). In 
the SAGE technique, a short sequence tag with 10 base nucleotides representing each 
expressed sequence is excised and the tags from different expressed sequences are ligated 
for sequencing analysis. This strategy provides maximal coverage of the expressed genes 
for gene identification at the whole genome level while keeping the sequencing analysis 
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at a manageable scale. Application of the SAGE technique has provided valuable 
information in various biological systems (Zhang et al, 1997, Velculescu et al, 1997, 
Madden et al, 1997, Hibi et al, 1998, Hashimoto et al, 1999). 

However, there are two problems when applying the SAGE tag sequence for gene 
identification. The first is that many SAGE tags identified have no match to known 
sequences in databases (Zhang et al, 1997, Velculescu et al, 1997). These tags may 
represent potentially novel genes. It is difficult, however, to use this tag information for 
further characterization of the corresponding genes because of their short length. The 
second problem is that many SAGE tag sequences have multiple matches with sequences 
in the databases. These matched sequences have no similarity to each other except that 
they share the same SAGE tag sequence. This feature makes it difficult to determine the 
correct sequence in a particular tissue corresponding to a SAGE tag among these matched 
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SUMMARY OF THE INVENTION 



To overcome these problems, the present inventors developed a technique called 
M; t he Generation of Longer cDNA fragments from SAGE Tags for Gene Identification 

t| 20 (GLGI). The key features of this technique are the use of a sequence containing a SAGE 
!| tag as the sense primer, and the use of a single-base anchored oligo-dT as the antisense 

3 primer, and Pfu DNA polymerase for PCR amplification. By using this approach, a 

SAGE tag sequence can be converted immediately into a longer cDNA fragment 
containing up to several hundred bases from the SAGE tag to the 3' end of the 
25 corresponding cDNA. The development of the GLGI technique overcomes the two 
obstacles discussed above and should have wide application in SAGE-related techniques 
for global analysis of gene expression. The same principle can be applied to confirm the 
reality of genes predicted by bioinformatics tools. 



30 



Therefore, in one embodiment of the present invention, there is provided a 
method for characterizing a SAGE tag fragment comprising (a) obtaining a RNA sample 
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from the same tissue type as used in generating said SAGE tag; (b) generating cDNA 
fragments that correspond to the SAGE tag from said RNA sample by perforrmng a DNA 
amplification reaction wherein primers used comprise: 

(i) a SAGE tag sequence as a sense primer; and 

(ii) at least one single-base anchored oligo-dT pnmer as an antisense primer; 

and 

(iii) analyzing said cDNA fragments. The RNA sample preferably is the RNA 
sample used to perform SAGE. The DNA amplification preferably compnses 
polymerase chain reaction, for example, using PJu DNA polymerase. The Mg 
concentration preferably is 4 mM. The cDNA fragments generated are generally about 

50 to 600 base pairs in length. 

The method uses single-base anchored oligo-dT primers comprising a single-base 
anchored to the 3' end of the oligo-dT primer said base excluding dT, preferably 

-- cWfgi -^ rom -r0-^^ 

The sense pnmer may further comprise a fiamHI recognition sequence a. the 5' end. The 
SAGE tag may further comprise a /Mil recognition sequence at the 5' end. 

The method may further comprise cloning cDNA fragments, sequencing the 
clones to identify the cDNA fragment sequence, and comparing the cDNA sequence to 
sequences in exts.ing DNA databases. Alternatively, the method may compnse 
hybridizing the cDNA fragments with known sequences. In a more specific embodtment, 
the method comprises perforrmng a DNA amplification reaction using (a) a sense pnmer 
designed based on an extsting exon sequence, (b) a single-base anchored oligo-dT pnmer 
as an antisense primer, and (c) c.ontng and sequencing the amplified DNA. Cloning may 
advantageously include cloning into an expression vector, including a promoter operable 
in prokaryotic or eukaryotic cells. The exon sequences may be predtcted by 
bioinformatics tools. The amphfied sequences may be aligned wtth genomtc DNA 
sequences. 



1650516.1 



I 



The t,ssue type may be colon, thymus, small intestine, heart, placenta, skeletal 
muscle, testes, bone marrow, trachea, spinal cord, liver, spleen, brain, lung, ovary, 
prostate, skin, cornea, retina, and breast. 

5 The present invention also describes a method for identifying a gene comprising: 

a) obtaining an isolated protein; b) digesting said protein to obtain at least a first protein 
fragment; c) obtaining at least a first amino acid sequence from said first protein 
fragment; d) generating a first DNA fragment that encodes said first protein fragment; e) 
performing a DNA amplification reaction with cDNA obtained from the same tissue 

10 sample as the isolated protein wherein primers used comprise: (i) a sense primer 
comprising said first DNA; and (ii) at least one single-base anchored oligo-dT primer as 
an antisense primer; and f) analyzing said cDNA fragments. 

In one embodiment of the method the steps c) through f) are repeated with other 

T5--protei^^^^ 

repeated with a second protein fragment, a third protein, a fourth protein fragment, or a 
fifth protein fragment to mention a few. In some specific embodiments, at least three 
amino acid sequences are obtained from the protein. 

20 In some embodiments of the method digesting the protein is followed by a 

separation to obtain purified protein fragments. The digestion may comprise the use of 
proteases well known in the art such as trypsin, chymotrypsin, elastase, collagenase, 
leupeptin and endopeptidases. Other protein digesting enzymes may also be used. 
Separation of the digested protein fragments may be based on the size of the protein 

25 fragments. 

In specific embodiment of the method the separation and purification may involve 
protein precipitation; chromatographic techniques such as HPLC, FPLC, ion exchange 
chromatography, molecular sieve chromatography; size separation methods such as gel 
30 electrophoresis. Other separation and purification methods known in the art may be used 

as well. 
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In addition the invention also provides methods for simultaneously characterizing 
a set of SAGE tag fragments comprising: a) obtaining a RNA sample; b) generating 
cDNA fragments using a 3' anchored oligo dT primer for first strand synthesis; c) 
digesting the cDNA generated in step b) with an enzyme; d) isolating 3' cDNA fragments 
of the digested cDNA; e) amplifying the 3'cDNA fragments of step d) by (i) ligating a 
SAGE linker to the 3'cDNA; (ii) mixing the 3' cDNA with a sense primer comprising the 
sequence of the SAGE linker, an antisense primer comprising the sequence of the primer 
used in step b) or a fragment thereof, and a polymerase enzyme under conditions suitable 
for amplification; f) purifying the amplified 3'cDNA fragments obtained in step e); g) 
performing a second amplification comprising generation of longer cDNA fragments 
from SAGE tags in a multi-well format by mixing said 3' cDNA fragments with a sense 
primer comprising a SAGE tag sequence and a restriction enzyme sequence, an antisense 
primer comprising the sequence of the primer used in step b) or a fragment thereof; and a 
pol-y-merase-enzyme-underix.ondito 
sequencing the products generated in step g). 

The 3' anchored oligo dT primer for first strand synthesis can be further attached 
to an affinity label such as biotin. This allows for isolation of the cDNA or fragments 
thereof by an affinity-based isolating method using for example streptavidin to recognize 
and bind the biotin. However, as will be recognized by the skilled artisan, one is not 
restricted to the use of streptavidin and biotin and any affinity label system may be used, 
for example, any antigen and its corresponding antibody, etc. 

The enzyme used to digest the cDNA generated in step c) can be a restriction 
enzyme for example Malll. In a preferred embodiment the polymerase enzyme used in 
steps e) and g) of the method is PLATINUM Taq which provides high specificity and 
increases yield of the final product. 

The steps of cloning and sequencing are well known to the skilled artisan and 
generically comprise : a) precipitating and purifying the amplified products of step g) in 
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the multi-well format; b) cloning the purified products into a vector, c) transforming 
competent bacteria with cloned products; d) screening for transformants; and e) 
sequencing DNA from transformants to identify the gene encoded by the SAGE tag. In 
specific embodiments, the positive transformants are screened by direct colony-PCR™ 
amplifications. 

In preferred embodiments of this method more than one SAGE tags are 
simultaneously identified. This multiple identification provides for high-throughput. The 
high-throughput generation of longer SAGE tags for gene identification (GLGI) 
procedure has several important features, for example, (i) 3' cDNAs instead of full- 
length cDNAs are used as the templates for GLGI amplification. This prevents artificial 
amplification from non-specific annealing of sense primer. The 3' cDNAs can be 
amplified to provide sufficient templates for GLGI amplification; (ii) a single antisense 
primer (in one example the primer is: 5 ' - ACT ATCT AGAGCGGCCGCTT- 3 ' (see also 
ExWpTe-3-)-is-used-for-^ 
anchored oligo dT primers. The sequence of the antisense primer is located in 3' end of 
all the cDNA templates incorporated from anchored oligo dT primers used for the first 
strand cDNA synthesis. Use of a single primer also increases the efficiency of GLGI 
amplification significantly as any annealing of this primer with 3' end sequence results in 
extension during PCR. This feature is particularly useful to amplify the templates with 
low copies; (iii) Use of PLATINUM Taq polymerase instead of Pfu DNA polymerase 
increases the yield of final products, while maintaining high specificity; (iv) the GLGI 
amplified DNAs are directly precipitated and cloned into vector without gel purification, 
which further prevents loss of amplified products. The inventors contemplate that this is 
especially important for products with short sizes and for products generated from 
templates with low copies. Thus, the methods of this invention provide the ability for 
large-scale identification of expressed genes. Genes of any eukaryotic origin, including 
human genes may therefore be identified at an accelerated rate by the simple, efficient 
and low-cost methods set forth herein. 
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Using the standard convention, "a" or "an" is defined herein to mean one or more 
than one. Other objects, features and advantages of the present invention will become 
apparent from the following detailed description. It should be understood, however, that 
the detailed description and the specific examples, while indicating preferred 
embodiments of the invention, are given by way of illustration only, since various 
changes and modifications within the spirit and scope of the invention will become 
apparent to those skilled in the art from this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The following drawings form part of the present specification and are included to 
further demonstrate certain aspects of the present invention. The invention may be better 
understood by reference to one or more of these drawings in combination with the 
detailed description of specific embodiments presented herein. 



FIG. 1. Schematic for GLGI . (FIG.1A). In this process, first strand cDNA 
synthesized by oligo-dT is used for PCR. In the first cycle, the template with the SAGE 
tag binding site is annealed by the sense primer and extended to the end of the template. 
In the second cycle, extension only occurs from the anchored oligo-dT primer annealed 
20 and paired correctly at the beginning of poly-dA sequences. Exponential amplification 
only occurs for the template with the SAGE tag binding site. (FIG. IB). GLGI results in 
the conversion of a 10 bases of SAGE tag to hundred bases of 3* cDNA fragment. 

FIG. 2. Size distribution of Nlalll digested cDNA . Double strand cDNA was 
25 digested by Nlalll and electrophoresed on a 1.5% agarose gel to demonstrate the size 
distribution of the digested fragments. 

FIG. 3. Specific amplification of 3' sequences corresponding to a specific SAGE 
tag sequence by GLGI . In the PCR reaction, each SAGE tag sequence was used as the 
30 sense primer, each single dA, dG or dC or a mixture of three anchored oligo-dT primers 
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was used as the antisense primers. The 3'-end nucleotide for Hs.184776 is dT, for 
Hs.3463 is dC, and for Hs. 118786 is dG. 

FIG. 4. Comparison between RAST-PCR method and GLGI method. A set of 4 
SAGE tags was chosen for the analysis. The same RNA from human colon and sense 
primers were used for both methods. The conditions used for RAST-PCR followed the 
procedures described in reference (Van den Berg et ah, 1999). 

FIG.5. Schematic for high-throughput GLGI . 

FIG.6. Schematic for high-throughput GLGI amplification. 

FIG.7. Identification of correct 3' sequences for mul tiple matched SAGE tags. 
SAGE tags with multiple matches were selected from the high abundant, intermediate 
-abundmt-and4ow-abundant-copies,_andJhoAeJags_were used as the sense primer for 
GLGI amplification. Gel demonstration of the 3' cDNAs amplified through GLGI. 

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

A. The Present Invention 

The inventors have developed a technique called the Generation of Longer cDNA 
fragments from SAGE tags for Gene Identification (GLGI), which converts SAGE tags, 
which are about 10 base pairs in length, into their corresponding 3' cDNA fragments 
covering hundred bases. The sense primer used comprises about 10 bases corresponding 
to a SAGE Tag and the antisense primer comprises a single base anchored to an oligo-dT 
primer. The single base may be dA, dG, or dC. PCR amplification using the primers 
described above generates a cDNA fragment extending from the SAGE Tag toward the 3' 
end of the corresponding sequence. 

Application of the GLGI technique solves two critical issues in the application of 
the SAGE technique: (i) longer fragments corresponding to novel SAGE tags can be 
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generated for further studies; and (ii) distinct fragments corresponding to a single SAGE 
tags can be identified and distinguished. Thus, the development of the GLGI method 
provides several potential applications. First, it provides a strategy for even wider 
application of the SAGE technique for quantitative analysis of global gene expression. 

5 Second, it can be used to identify the 3' cDNA sequence from any exon within a gene. 
These exons include ones predicted by bioinformatic tools. Third, a combined 
application of SAGE/GLGI can be used to complete the catalogue of the expressed genes 
in human and in other eukaryotic species. And fourth, a combined application of 
SAGE/GLGI can be applied to define the 3' boundary of expressed genes in the genomic 

1 0 sequences in human and in other eukaryotic genomes. 

In the present invention the GLGI technique is further developed herein to 
identify genes encoding isolated proteins. Isolated proteins are digested by methods 
known to one of ordinary skill in the art. The protein fragments are then used to obtain 

-!- 5 nucleotides^ seq uences are 

then used in GLGI wherein a DNA amplification reaction is performed using these 
nucleotide sequences as sense primers and using a single-base anchored poly-dT 
sequence as an anti-sense primer. This allows the amplification of DNA towards the 3' 
end of the gene encoding the isolated protein. Thus, the combination of GLGI with 
20 peptide/protein sequencing provides a novel method for gene identification starting with 
an isolated protein. 

The GLGI method is still further developed herein into a high-throughput method 
for simultaneously converting a large set of SAGE tags into their 3' cDNAs thereby 

25 simultaneously characterizing a set of SAGE tag fragments. The method provides for 
generation of cDNA fragments using a 3' anchored oligo dT primer for first strand 
synthesis from a RNA sample, digesting this cDNA with an enzyme and isolating and 
amplifying 3' cDNA fragments. Re-amplifying the 3'cDNA fragments in a multi-well 
format by GLGI amplification generates longer cDNA fragments corresponding to 

30 multiple SAGE tags. Cloning and sequencing then allows identification of the gene. 
This procedure is simple, rapid, efficient and low-cost and therefore provides a tool for 
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large-scale identification of expressed genes. Thus, genes of eukaryotic origin, such as 
human genes may be identified at an accelerated rate. 

B. Serial Analysis of Gene Expression (SAGE) 

The method for serial analysis of gene expression is described in U.S. Patent 
5,866,330 to Kinzler et ai, which is incorporated herein by reference. The method 
involves the identification of a short nucleotide sequence tag at a defined position in a 
messenger RNA. The tag is used to identify the corresponding transcript and gene from 
which it was transcribed. By utilizing concatenated tags a rapid quantitative and 
qualitative analysis of expressed genes is possible. SAGE is thus useful as a gene 
discovery tool for the identification of known genes and novel sequence tags 
corresponding to novel transcripts and genes. 

C. Oligonucleotide Probes and Primers 

The present invention, in various aspects , will involve the use of nucleic acid 



hybridization. Hybridization occurs between nucleic acids that have a given degree of 
"complementarity." Nucleic acid sequences that are "complementary" are those that are 
capable of base-pairing according to the standard Watson-Crick complementary rules. As 
used herein, the term "complementary sequences" means nucleic acid sequences that are 
substantially identical, or as defined as being capable of annealing to a target nucleic acid 
segment being described under relatively stringent conditions such as those described 
herein. 

The term primer, as defined herein, is meant to encompass any nucleic acid that is 
capable of priming the synthesis of a nascent nucleic acid in a template-dependent 
process. Typically, primers are oligonucleotides from ten to twenty-five base pairs in 
length, but longer sequences can be employed. Primers may be provided in double- 
stranded or single-stranded form, although the single-stranded form is preferred. Probes 
are defined differently, although they may act as primers. Probes, while perhaps capable 
of priming, are designed to binding to the target DNA or RNA and need not be used in an 
amplification process. 
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Primers should be of sufficient length to provide specific annealing to a RNA or 
DNA tissue sample. The use of a primer of between about 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 20-25, 25-30, 30-35 and 35-40 nucleotides in length allows the formation of a 
5 duplex molecule that is both stable and selective. Of particular importance are SAGE 
derived primers which range from about 10 to 30 bases. 

As a general rule, shorter oligomers are easier to make. However, numerous other 
factors are involved in determining usefulness. Both binding affinity and sequence 
10 specificity of an oligonucleotide to its complementary target increases with increasing 
length. It is contemplated that exemplary oligonucleotides of 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more base 
pairs will be used, although others are contemplated. Longer polynucleotides encoding 250, 
i 300, 500, 600, 700, 800, and longer are contemplated as well. Accordingly, nucleotide 

_J 15 sequences may be select ed for their ability to selectively form duplex molecules with 

fl: complementary stretches of genes or RNAs or to provide primers for amplification of 

I*i DNA or RNA from cells, cell lysates and tissues. The method of using probes and primers 

of the present invention is in the selective amplification and detection of genes, changes in 
gene expression, changes in mRNA expression wherein one could be detecting virtually any 
.'C 20 gene or genes of interest from any species. The target polynucleotide will be RNA 
S| molecules, mRNA, cDNA or amplified DNA. By varying the stringency of annealing, and 

'5'' the region of the primer, different degrees of homology may be discovered. 

Primers may be chemically synthesized by methods well known within the art. 
25 Chemical synthesis methods allow for the placement of detectable labels such as 
fluorescent labels, radioactive labels, etc., to be placed virtually anywhere within the 
polynucleotide acid sequence. Solid phase method of synthesis also may be used. 

The amplification primers may be attached to a solid-phase, for example, a latex 
30 bead, a magnetic bead, or the surface of a chip. Thus, the amplification carried out using 
these primers will be on a solid support/surface. 
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Furthermore, some primers of the present invention may have a recognition 
moiety attached. A wide variety of appropriate recognition means are known in the art, 
including fluorescent labels, radioactive labels, mass labels, affinity labels, 
chromophores, dyes, electroluminescence, chemiluminescence, enzymatic tags, or other 
ligands, such as avidin/biotin, or antibodies, which are capable of being detected and are 
described below. 

1. Primer Design 

According to the present invention, there are disclosed, in one aspect, oligo-dT 
primers for use in reverse transcription and amplification reactions. These primers are 
single-base 3'-anchored, i.e., contain a bases at their 3' ends. These bases are the singlets 
A, G or C. This creates a set of three primers. 

The particular length of the primer is not believed to be critical, with the dT 



sequence ranging from about 10 to about 25 bases, with 11 being a preferred 
embodiment. In some embodiments, the primers are labeled with radioactive species 
( 32 P, 14 C, 35 S, 3 H, or other isotope), with a fluorophore (rhodamine, fluorescein, GFP) or a 
chemiluminescent label (luciferase). 

Yet another primer specific to this invention is the sense prime that is comprised of a 
SAGE tag sequence. A discussion of these primers is provided U.S. Patent 5,866,330 to 
Kinzler et al., which is incorporated herein by reference. Other exon-specific or gene- 
specific primers may be used for the sequencing and characterizing of amplified 
sequences. 

2. Probes 

In various contexts, it may be useful to use oligo- or polynucleotides as probes for 
complementary or hybridizing DNA or RNA molecules. In this regard, one may include 
) particular "target" sequences in the oligos of the present invention in order to detect the 
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products by probe hybridization. Alternatively, the probes may recognize unique 
sequences in the amplified regions upstream of the anchored oligo-dT primers. 

3. Primer Synthesis 

Oligonucleotide synthesis is performed according to standard methods. See, for 
example, Itakura and Riggs (1980). Additionally, U.S. Patent 4,704,362; U. S. Patent 
5,221,619; U. S. Patent 5,583,013 each describe various methods of preparing synthetic 
structural genes. 

Oligonucleotide synthesis is well known to those of skill in the art. Various 
different mechanisms of oligonucleotide synthesis have been disclosed in for example, 
U.S. Patents 4,659,774, 4,816,571, 5,141,813, 5,264,566, 4,959,463, 5,428,148, 
5,554,744, 5,574,146, 5,602,244, each of which is incorporated herein by reference. 
Basically, chemical synthesis can be achieved by the diester method, the triester method, 
^olynudeotides^hospto Th£Se meth ° dS m 

discussed in further detail below. 

Diester method. The diester method was the first to be developed to a usable 
state, primarily by Khorana and co-workers (Khorana, 1979). The basic step is the 
joining of two suitably protected deoxynucleotides to form a dideoxynucleotide 
containing a phosphodiester bond. The diester method is well established and has been 
used to synthesize DNA molecules (Khorana, 1979). 

Triester method. The main difference between the diester and triester methods 
is the presence in the latter of an extra protecting group on the phosphate atoms of the 
reactants and products (Itakura et ah, 1975). The phosphate protecting group is usually a 
chlorophenyl group, which renders the nucleotides and polynucleotide intermediates 
soluble in organic solvents. Therefore purification's are done in chloroform solutions. 
Other improvements in the method include (i) the block coupling of trimers and larger 
oligomers, (ii) the extensive use of high-performance liquid chromatography for the 
purification of both intermediate and final products, and (iii) solid-phase synthesis. 
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Porynudeotide phosphorylase method. This is an enzymatic method of DNA 
synthesis that can be used to synthesize many useful oligodeoxy.ucleo.ides (Mam - 
aL 1978 Gillam « ai, 1979). Under controlled conditions, polymtcleottde 
phosphory'lase adds predominantly a single nuc.eot.de to a short oligodeoxynucleot.de. 
Chromatographic purification allows the desired single adduc, to be obtamed. At leas, a 
mmer ,s required to start the procedure, and this primer must be obtained by some other 
method. The polynucleotide phosphorylase method works and has the advantage that the 
procedures involved are familiar to most biochemists. 

So.id-pUase methods. Drawing on the technology developed for the solid-phase 
synthesis of polypeptides, it has been possible to attach the initial nucleotide to soltd 
SU ppor, material and proceed wth the stepwise add.tion of nucleotide, All rmxmg and 
washing steps are simplified, and the procedure becomes amenable to automation. These 



Phosphoramidite chemistry (Beaucage and Lyer, 1992) has become by far the 
most widely used coupling chemistry for the synthesis of oligonudeoudes. As ts well 
known to those skilled in the art, phosphoramidite synthesis of oligonucleotides mvolves 
activation of nucleoside phosphoramidite monomer precursors by reaction w,.h an 
activating agent to form activated intermediates, followed by sequential addition of the 
activated intermediates to the growing oligonucleotide chain (generally anchored at one 
end to a suitable solid support) to form the oligonucleotide product. 

D. Amplification 

PCR ™ In some embodiments, poly-A mRNA is isolated and reverse transenbed 
(refcred to as RT) to obtain cDNA which is then used as a template for polymerase cham 
reaction (referred to as PCR™) based amplification. In other embodiments, cDNA may 
be obtamed and used as a template for the PCR™ reaction. In PCR™, p,rs of primers 
,ha, sdeCvely hybridize to nucle.c acids are used under conditions that perm,, selective 
hybridization. The .erm primer, as used herein, encompasses any nucleic ac.d that ,s 
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capable of priming the synthesis of a nascent nucleic acid in a template-dependent 
process. Pnmers may be provided in double-stranded or single-stranded form, although 
the single-stranded form is preferred. 

5 The pnmers are used in any one of a number of template dependent processes to 

amplify the target-gene sequences present in a given template sample. One of the best 
known amplification methods is PCR™ which is described in detail in U.S. Patents 
4,683,1 95, 4,683,202 and 4,800,1 59, each incorporated herein by reference. 

0 In PCR™, two primer sequences are prepared which are complementary to 

regions on opposite complementary strands of the target-gene(s) sequence. The primers 
will hybndize to form a nucleic-acid iprimer complex if the target-gene(s) sequence is 
present in a sample. An excess of deoxynucleoside triphosphates are added to a reaction 
mixture along with a DNA polymerase, e.g., Tag polymerase, that facilitates 

15 tempT!te=a^T«fc^^ 



rl'.l 
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If the target-gene(s) sequence :primer complex has been formed, the polymerase 
will cause the primers to be extended along the target-gene(s) sequence by adding on 
nucleotides. By raising and lowering the temperature of the reaction mixture, the 
extended primers will dissociate from the target-gene(s) to form reaction products, excess 
pnmers will bind to the target-gene(s) and to the reaction products and the process is 
repeated. These multiple rounds of amplification, referred to as "cycles," are conducted 
until a sufficient amount of amplification product is produced. 

> 5 Next, the amplification product is detected. In certain applications, the detection 

may be performed by visual means. Alternatively, the detection may involve indirect 
identification of the product via fluorescent labels, chemiluminescence, radioactive 
scintigraphy of incorporated radiolabel or incorporation of labeled nucleotides, mass 
labels or even via a system using electrical or thermal impulse signals (Affymax 

30 technology). 
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A reverse transcriptase PCR™ amplification procedure may be performed in order 
to quantify the amount of mRNA amplified. Methods of reverse transcribing RNA into 
cDNA are well known and described in Sambrook et al, 1989. Alternative methods for 
reverse transcription utilize thermostable DNA polymerases. These methods are 
described in WO 90/07641, filed December 21, 1990. 

E. Hybridization 

Hybridization is the technique used to identify nucleic acid products by the nature 
of the complementarity of a target gene to the hybridization probe or primer. Varying 
degrees of probe/primer selectivity towards target sequence can be measured. 

For applications requiring high selectivity, one typically will employ relatively 
stringent conditions to form the hybrids, e.g., one will select relatively low salt and/or 
high temperature conditions, such as provided by about 0.02 M to about 0.10 M NaCl at 

little, if any, mismatch between the probe and the template or target strand, and would be 
particularly suitable for detecting specific genes or specific mRNA transcripts. It is 
generally appreciated that conditions can be rendered more stringent by the addition of 

increasing amounts of formamide. 

For certain applications, it is appreciated that lower stringency conditions are 
required. Under these conditions, hybridization may occur even though the sequences of 
probe/primer and target strand are not perfectly complementary, but are mismatched at 
one or more positions. Conditions may be rendered less stringent by increasing salt 
25 concentration and decreasing temperature. For example, a medium stringency condition 
could be provided by about 0.1 to 0.25 M NaCl at temperatures of about 37°C to about 
55°C, while a low stringency condition could be provided by about 0.15 M to about 0.9 
M salt, at temperatures ranging from about 20°C to about 55°C. Thus, hybridization 
conditions can be readily manipulated, and thus will generally be a method of choice 
30 depending on the desired results. 
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In other embodiments, hybridization may be achieved under conditions of, for 
example, 50 mM Tris-HCl (pH 8.3), 75 mM KC1, 3 mM MgCl 2 , 10 mM dithiothreitol, at 
temperatures between approximately 20°C to about 37°C. Other hybridization conditions 
utilized could include approximately 10 mM Tris-HCl (pH 8.3), 50 mM KC1, 1.5 uM 
MgCl 2 , at temperatures ranging from approximately 40°C to about 72°C. 

The selected conditions will depend on the particular circumstances based on the 
particular criteria required (depending, for example, on the G+C content, type of target 
nucleic acid, source of nucleic acid, size of hybridization probe, etc.). Following washing 
of the hybridized surface to remove non-s P ecifically bound probe/primer molecules, 
hybridization is detected, or even quantified, by means of the label. 

In general, it is envisioned that hybridization with respect to the primers described 
herein or in the context of probes will be useful both in solution hybridization, as in 



reference gene expression, as well as in embodiments employing a solid phase. In 
embodiments involving a solid phase, the test DNA (or RNA) can be adsorbed or 
otherw 1S e affixed (for example, by affinity separation methods) to a selected matnx or 
surface. This fixed, single-stranded nucleic acid can then be subject to hybridization with 
selected probes or primers under desired conditions. Alternatively, the probe or primer 
may be fixed to the selected matnx or surface for gene detection. Suitable surfaces 
include chips, latex beads or plates. 

F. cDNA Synthesis 

In a preferred embodiment of the invention, the assay is employed for analyzing 
gene expression patterns using RNA as the starting template. The RNA template may be 
presented as either total cellular RNA or isolated mRNA. Both types of sample yield 
comparable results. In still further embodiments, other types of nucleic acids may serve 
as template in the assay, including genomic or extragenomic DNA, viral RNA or DNA, 
or nucleic acid polymers generated by non-replicative or artificial means. 
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In a preferred embodiment of the invention, RNA is converted to cDNA using a 
oligo-dT pnmer. Methods of reverse transcribing RNA into cDNA are well known, and 
described in Sambrook et al., 1989. Alternative methods for reverse transcription utilize 
thermostable DNA polymerases. These methods are described in WO90/07641. In 
5 alternative embodiments, avian myeloblastosis virus reverse transcriptase (AMV-RT), or 
Maloney murine leukemia virus reverse transcriptase (MoMLV-RT) may be used. Other 
enzymes are contemplated as well. 

In another embodiment, RNA targets may be reverse transcribed using other 
1 0 non-specific primers, such as an anchored oligo-dT primer, or random sequence primers. 
An advantage of this embodiment is that the "unfractionated" quality of the mRNA 
sample is maintained because the sites of priming are non-specific, i.e., the products of 
this RT reaction will serve as template for any desired target in the subsequent PCR™ 
0 amplification. This allows samples to be archived in the form of DNA, which is more 
§ 1-5 stable4han-RNA, 
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G. Sequencing 

Methods for sequencing are well known in the art, in particular, the chain- 
* termination technique pioneered by Sanger et al. in the mid-1970's. Recent 

S 20 developments have increased dramatically the number of bases that can be sequenced in a 
short period of time. The following U.S. patents, dealing with DNA sequencing, are 
incorporated by reference: U.S. Patents 6,004,446; 5,985,556; 5,968,743; 5,876,934: 
5,866,328; 5,858,671 ;5,846,727; 5,821,060; 5,821,058; 5,817,797; 5,780,232; 5,755,943 
5,674,716; 5,639,608; 5,608,063; 5,523,206; 5,455,008; 5,432,065; 5,405,746;5,360,523 
25 5,308,751; and 5,207,880. 
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H. Restriction Enzymes 

Restriction-enzymes recognize specific short DNA sequences four to eight 
nucleotides long (see Table 1), and cleave the DNA at a site within this sequence. 
Restriction enzymes are used to cleave cDNA molecules at sites corresponding to various 
restriction-enzyme recognition sites. In context of this invention, the enzyme Nlalll is 
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often used in the SAGE technique and the SAGE tags often are comprised of Nlalll 
recognition sequences. The sense primers in the present invention may further compnse 
a restriction enzyme recogmtion sequence, such as the BamHI sequence, to allow easier 
cloning amplified DNA fragments for further analysis. 

As the sequence of the recognition site is known (see list below), primers can be 
designed comprising nucleotides corresponding to the recognition sequences. If the 
primer sets have m addition to the restriction recognition sequence, degenerate sequences 
corresponding to different combinations of nucleotide sequences, one can use the 
amplified cDNA fragments that have the particular restriction enzyme sequence for 
cloning the cDNA into cloning vectors . The list below exemplifies the currently known 
restriction enzymes that may be used in the invention. 



Table 1: Restriction Enzymes 
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Enzyme Name 

Aatll 

Acc65 I 

Acc I 

Acil 

Acll 

Afel 

Aflll 

AfllH 

Age I 

Ahdl 

Alul 

Alwl 

AlwNI 

Apal 

ApaL I 

Apo I 

Asc I 

Ase I 

Ava I 

Ava II 

Avrll 

Bael 

BamHI 

Ban I 



"Recognition-Sequence — 

GACGTC 

GGTACC 

GTMKAC 

CCGC 

AACGTT 

AGCGCT 

CTTAAG 

ACRYGT 

ACCGGT 

GACNNNNNGTC 

AGCT 

GGATC 

CAGNNNCTG 

GGGCCC 

GTGCAC 

RAATTY 

GGCGCGCC 

ATTAAT 

CYCGRG 

GGWCC 

CCTAGG 

NACNNNNGTAPyCN 

GGATCC 

GGYRCC 
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Ban II 

Bbsl 

Bbvl 

BbvCI 

Beg I 

BciVI 

Bell 

Bfal 

Bgll 

Bgl II 

BlpI 

Bmr I 

Bpml 

BsaAI 

BsaBI 

BsaHI 

Bsal 

BsaJI 

BsaWI 

BseRI 

Bsgl 

— BsiE-I 

BsiHKAI 

BsiWI 

BslI 

BsmA I 

BsmBI 

BsmFI 

BsmI 

BsoBI 

Bspl286 I 

BspDI 

BspEI 

BspHI 

BspM I 

BsrBI 

BsrDI 

BsrFI 

BsrGI 

BsrI 

BssH II 

BssKI 

Bst4C I 

BssSI 

BstAPI 

BstBI 



GRGCYC 

GAAGAC 

GCAGC 

CCTCAGC 

CGANNNNNNTGC 

GTATCC 
TGATCA 
CTAG 

GCCNNNNNGGC 

AGATCT 

GCTNAGC 

ACTGGG 

CTGGAG 

YACGTR 

GATNNNNATC 

GRCGYC 

GGTCTC 

CCNNGG 

WCCGGW 

GAGGAG 

GTGCAG 

CGRYCG 



GWGCWC 

CGTACG 

CCNNNNNNNGG 

GTCTC 

CGTCTC 

GGGAC 

GAATGC 

CYCGRG 

GDGCHC 

ATCGAT 

TCCGGA 

TCATGA 

ACCTGC 

CCGCTC 

GCAATG 

RCCGGY 

TGTACA 

ACTGG 

GCGCGC 

CCNGG 

ACNGT 

CACGAG 

GCANNNNNTGC 

TTCGAA 
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BstE II GGTNACC 

BstF5 1 GGATGNN 

BstNI CCWGG 

BstU I CGCG 

BstX i CCANNNNNNTGG 

BstY I RGATCY 

BstZ17 1 GTATAC 

Bsu36I CCTNAGG 

Btg I CCPuPyGG 

Btr I CACGTG 

Cac8 1 GCNNGC 

Cla I ATCGAT 

Ddel CTNAG 

Dpn l GATC 

Dpn II GATC 

Dral TTTAAA 
Drain CACNNNGTG 
Drd i GACNNNNNNGTC 
Eael YGGCCR 
Eag I CGGCCG 
EarI CTCTTC 



GGCGGA 
CCTNNNNNAGG 
RGGNCCY 
GAATTC 
GATATC 
CCCGCNNNN 
GCNGC 
GGATG 
GGCCGGCC 
TGCGCA 
RGCGCY 
GGCC 
GACGC 
GCGC 
GTYRAC 
AAGCTT 
GANTC 
GCGC 
GTTAAC 
CCGG 
GGTGA 
GGCGCC 
GGTACC 
GATC 
GAAGA 





EcoNI 




EcoO109 I 




EcoRI 


\^ 
hi" 


EcoRV 




Fau I 




Fnu4H I 




Fokl 




Fse I 


I 


Fsp I 


o 


Haell 




Hae III 




Hgal 




Hhal 




Hinc II 




Hind III 




Hinfl 




HinPl I 




Hpal 




Hpall 




HphI 




KasI 




Kpnl 




Mbol 




Mbo II 
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Mfel 
Mlul 
Mlyl 
Mnll 
Msc I 
Mse I 
MslI 
MspAl I 
Msp I 
Mwo I 
Nael 
Narl 
Nci I 
Ncol 
Ndel 
NgoMI V 
Nhel 
Nla III 
NlalV 
Not I 
Nrul 

_N-si-I 

Nspl 

Pad 

PaeR7 I 

Pcil 

PflFI 

PflMI 

Plel 

Pmel 

Pmll 

PpuM I 

PshAI 

Psil 

PspGI 

PspOM I 

PstI 

Pvul 

PvuII 

Rsal 

RsrII 

Sac I 

Sac II 

Sail 

Sap I 

Sau3A I 



CAATTG 

ACGCGT 

GAGTCNNNNN 

CCTC 

TGGCCA 

TTAA 

CAYNNNNRTG 

CMGCKG 

CCGG 

GCNNNNNNNGC 

GCCGGC 

GGCGCC 

CCSGG 

CCATGG 

CATATG 

GCCGGC 

GCTAGC 

CATG 

GGNNCC 

GCGGCCGC 

TCGCGA 

_ATGCAT 

RCATGY 

TTAATTAA 

CTCGAG 

ACATGT 

GACNNNGTC 

CCANNNNNTGG 

GAGTC 

GTTTAAAC 

CACGTG 

RGGWCCY 

GACNNNNGTC 

TTATAA 

CCWGG 

GGGCCC 

CTGCAG 

CGATCG 

CAGCTG 

GTAC 

CGGWCCG 

GAGCTC 

CCGCGG 

GTCGAC 

GCTCTTC 

GATC 



1650516.1 



-23- 



Sau96 1 GGNCC 

Sbfl CCTGCAGG 

Seal AGTACT 

ScrFI CCNGG 

SexAI ACCWGGT 

SfaNI GCATC 

Sfcl CTRYAG 

sfi i GGCCNNNNNGGCC 

Sfol GGCGCC 

SgrA I CRCCGGYG 

Smal CCCGGG 

S m ii CTYRAG 

SnaB I TACGTA 

Spel ACTAGT 

SphI GCATGC 

Sspl AATATT 

StuI AGGCCT 

Sty I CCWWGG 

Swal ATTTAAAT 

TaqI TCGA 

Tfi I GAWTC 

_TJU CTCGAG 

Tse I GCWGC 

Tsp45 I GTS AC 

Tsp509 1 AATT 

TspRI CAGTG 

Tthllll GACNNNGTC 

Xbal TCTAGA 

Xcm I CCANNNNNNNNNTGG 

Xhol CTCGAG 

Xmal CCCGGG 

Xmn I GAANNNNTTC 



I. Polymerases 

1. Reverse Transcriptases 

According to the present invention, a variety of different reverse transcriptases 
may be utilized. The following are representative examples. 

M-MLV Reverse Transcriptase. M-MLV (Moloney Murine Leukemia Virus 
Reverse Transcriptase) is an RNA-dependent DNA polymerase requiring a DNA primer 
and an RNA template to synthesize a complementary DNA strand. The enzyme is a 
product of the pol gene of M-MLV and consists of a single subunit with a molecular 
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wel gh« of 71kDa. M-MLV RT has a weaker intrinsic RNaseH activity than Avian 
Myetobiastosis Virus (AMV) reverse transcriptase which is important for achieving long 
full-length complementary DNA (>7 kB). 

M-MLV can be use for first strand cDNA synthesis and primer extensions. 
Storage recommend at -20°C in 20 mM Tris-HCl (pH 7.5), 0.2M NaCl, 0.1 mM EDTA, 
! mM DTT 0.01% Nonidet* P-40, 50% glycerol. The standard reaction condrtions arc 
50 mM Tris-HC, (pH 8.3), 7 mM MgCl, 40 mM KC1, 10 mM DTT, 0.1 mg/m. BSA, 
0.5 mM 3 H-dTTP, 0.025 mM oligo(dT)5o, 0.25 mM poly(A)400 at 37°C. 

M-MLV Reverse Transcriptase, RNase H Minus. This is a form of Moloney 
murine leukemia virus reverse transenptase (RNA-dependent DNA polymerase) whrch 
has been genetically altered to remove the associated ribonuc.ease H activity (Tanese and 
Goff 1988). It can be used for first strand cDNA synthesis and primer ex.cns.on. 
.5 Storage OT c^0WWs J H9-(-pH-7^) r 0aM-NaCl,A.l.mM EDTA. 1 mM DTT, 

0.01% Nonidet® P-40, 50% glycerol. 

AMV Reverse Transcriptase. Avian Myeloblastosis Virus reverse transcriptase 
is a RNA dependent DNA polymerase that uses single-stranded RNA or DNA as a 
20 template to synthesize the complementary DNA strand (Houts etal., 1979). I. has 
activtty at high temperature (42°C - 50«C). This polymerase has been used to syn.hes.ze 

long cDNA molecules. 

taction conditions are 50 mM Tris-HCl (pH 8.3), 20 mM KC1, 10 mM MgCh, 
25 500 uM of each dNTP, 5 mM dithiothreitol, 200 ug/ml oligo-dT,,;,,,, 250 ug/ml 
polyadenylated RNA, 6.0 pMol "P-dCTP, and 30 U enzyme in a 7 u> volume. Incuba.e 
45 min a. 42°C. Storage buffer is 200 mM KPO, (pH 7.4), 2 mM d.,hiothrei<ol, 0.2% 
Triton X-100, and 50% glycerol. AMV may be used for firs, strand cDNA synthesrs, 
RNA or DNA dideoxy chain termination sequencing, and fill-ins or other DNA 
30 polymerization reactions for which Klenow polymerase ,s not satisfactory (Mamatts 
etal., 1976). 
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2 DNA polymerases 

The present i„ve„.,o„ also contemplates the use of various DNA polymerase. 
Exemplary polymerases are described below. 

M DNA Polymerase, Large Fragmer,,. Bs, DNA Polymerase Large Fragment 
i5 the portion of the BacUlus s.earo.l.er.opMlus DNA Polymerase protein that con.ams 
the s ^y polymerase activity, but lacks the 5'->3' exonuclease domam. 
Polymerase Large Fragment ,s prepared from an E. coH strain containing a genetic fuston 
of the Bacillus smrOhem.ophilus DNA Polymerase gene, lactang the ->3 
exonuclease domain, and the gene coding for E coll maltose bindtng protein (MBP)^ 
The fusion protein is purified ,0 near homogeneity and the MBP. portion ts cieaved off 
ntr0 . The remaining polymerase is purified free of MBP (liyy e, aL, 1991). 

ST^poIyiBMrartr^^ 

(Hugh 4 Gnffin, ,994; McC.ary etal., 199,, and Raptd Sequencing from nanog™, 
amounts of DNA template (Mead etal., ,991). The reaction buffer ,s IX ThermoPd 
Butter (20 mM Tris-HC. ( P H 8.8 at 25'C), ">mM KC1, 10 mM (NHASfc. 2mM 
MgSO., 0.1% Triton X-.00). Supplied with enzyme as a 10X concentrated stock. 

Bs, DNA Polymerase does not exhibit 3'-5- exonuclease activity. 100 u/ml BSA 
or 0 1% Triton X-100 is required for Vong term storage. Reaction temperatures above 
70»C are no. recommended. Hea, inactivated by incubation a, 80°C for lOmin. Bs, 
DNA Polymerase cannotbe used fortherma. cycle sequencing. Unt. assay condthons are 

-r • urwnH!!^ 10 mM MgCl 2 , 30 nM M13mpl8 ssDNA, 70 nM 
50 mM KC1, 20 mM Tns-HCl (pH 8.8), iu mivi m&^v, 

< Al\ 94 mer WEB #1224), 200 pM daTP, 200 pM dCTP, 
M13 sequencing primer (-47) 24 mer uncb ^ h r 

200pM dGTP, 100 pM 3 H-dTTP, 100 pg/ml BSA and enzyme. Incubate at 65 C. 
Storage buffer is 50 mM KC1, 10 mM Tns-HCl (pH 7.5), 1 mM ditluothreitol, 0.1 mM 
EDTA, 0.1% Tnton-X-100 and 50% glycerol. Storage is at -20°C. 
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VENTr® DNA Polymerase and VENTr® (exo ) DNA Polymerase. Vent R DNA 
Polymerase is a high-fidelity thermophilic DNA polymerase. The fidelity of Vent R DNA 
Poiymerase is 5-15-fold higher than that observed for Taq DNA Polymerase (Mattila 
etal 1991- Eckert and Kunkel, 1991). TWs high fidelity derives in part from an mtegral 
3<->5' proofreading exonuclease activity in Vent R DNA Polymerase (Mattila et ai, 1991; 
Kong et ai, 1993). Greater than 90% of the polymerase activity remains following a 1 h 
incubation at 95°C. 

Vent, (exo-) DNA Polymerase has been genetically engineered to eliminate the 
y-»5' proofreading exonuclease activity associated with Vent, DNA Polymerase (Kong 
e,al 1993) This is the preferred form for high-temperature dideoxy sequencmg 
reactions and for high yield primer extension reactions. The fidehty of polymerization by 
to form is reduced to a leve! about 2-fo,d higher ft* that of Taq DNA Polymerase 
(Mattila e,al, 1991; Eckert & Kunkel, ,991). Vent, (exo-) DNA Polymerase » an 
excdlent choice for DNA seq uenc inpndin^ 
pages 118 and 121). 

Both Vent, and Vent* (exo-) are purified from strains of E. coli that carry the 
Vent DNA Polymerase gene from the archaea Thermococcus lUoralis (Perler etal., 
, 992) The native organism is capable of growth a. up to 98°C and was isolated from a 
submanne thermal vent (Belkin and Jarurasch, 1985). They are useful rn pnmer 
extension, thermal cycle sequencing and high temperature dideoxy-sequencmg. 

DEEP VENTr™ DNA Polymerase and DEEP VENT R >xo ) DNA Polymerase. 

Deep Ve„, R DNA Polymerase is the second high-fidelity thermophilic DNA polymerase 
ava,lab,e from New England Biolabs. The fidelity of Deep Ve„t R DNA Polymerase ,s 
derived in par, from an integral y-tf proofreading exonuclease activity. Deep Vent, rs 
even more stable than Vent„ at temperatures of 95 to 10O°C (see graph). 

Deep Vent, (exo-) DNA Polymerase has been genetically engineered to eliminate 
the y->? proofreading exonuclease activity associated with Deep Vent R DNA 
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Polymerase. This exo- version can be used for DNA sequencing but requires dtfferen, 
dNTP/ddNTP ratios than those used with Vent* (exo-) DNA Polymerase. Both Deep 
Ven. R and Deep Ven, R (exo-) are punfied from a strain of E. coU that carries the Deep 
Vent R DNA Polymerase gene from Pyococcus species GB-D (Perler * al, 1996). The 
native organtsm was isolated from a submarine thermal vent at 2010 meters (Jannasch 
« „/ 1992) and is able to grew at temperatures as high as 104°C. Both enzymes can be 
used in primer ex.ens.on, thermal cycle sequencing and high temperature dideoxy- 
sequencing. 

T7 DNA Polymerase (unmodified). T7 DNA polymerase catalyzes the 
replication of T7 phage DNA during infection. The protein dimer has two catalytic 
activities: DNA polymerase activity and strong 3'-5' exonuclease (Hon etal, 1979; 
Engler et al , 1983; Nordstrom et al, 1981). The high fidelity and rapid extension rate of 
the enzyme make it particularly useful in copying long stretches of DNA template. 
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T7 DNA Polymerase consists of two subunits: T7 gene 5 protein (84 kilodaltons) 
and E coli thioredoxin (12 kilodaltons) (Hori et al, 1979; Studier et al, 1990; Grippo & 
Rl chardson, 1971; Modrich & Richardson, 1975; Adler & Modnch, 1979). Each protein 
is cloned and overexposed in a T7 expression system in E. coli (Studier et al , 1990). It 
can be used in second strand synthesis in site-directed mutagenesis protocols (Bebenek & 
Kunkel, 1989). 

The reason buffer is IX T7 DNA Polymerase Buffer (20 mM Tris-HCl (pH 7.5), 
10 mM MgCh, 1 mM dithiothreitol). Supplement with 0.05 mg/ml BSA and dNTPs. 
Incubate at 37°C. The high polymerization rate of the enzyme makes long incubations 
unnecessary. T7 DNA Polymerase is not suitable for DNA sequencing. 

Unit assay conditions are 20 mM Tris-HCl (pH7.5), 10 mM MgCl 2 , 1 mM 
dithiothreitol, 0.05 mg/ml BSA, 0.15 mM each dNTP, 0.5 mM heat denatured calf 
thymus DNA and enzyme. Storage conditions are 50 mM KPC, (pH7.0), 0.1 mM 
EDTA, 1 mM dithiothreitol and 50% glycerol. Store at -20°C. 
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DNA Polyn.er.se I (E. coU). DNA Polymerase I is a DNA-depe„de». DNA 
polymerase with inhere,. 3'-5' and y-3' exonuclease activities (Lehman, 1981). The 
S--.3' exonuclease activity removes nneleotides ahead of the growing DNA cham, 
aUowing nick-translation. It is isolated from E. colt CM 5199, a lysogen carrying IfoU 
transducing phage (obtatned from N.E. Murray) (Murray * Ke.ley, 1979). The phage in 
this strain was derived from the onginal polA phage encoding wild-type Polymerase I. 

Applications include nick translation of DNA to obtain probes with a high 
specfic activity (Meinko.h and Wahl, 1987) and second strand synthests of cDNA 
(Gublcr & Hoffmann, 1983; D'Alessio & Gerard, 1988). The reaettcn buffer » £ col, 
Polymerase Klenow Buffer (10 mM Tris-HCl (pH7.5), 5 mM MgC 2 , 7.5 mM 
dithiothreitol). Supplement with dNTPs. 

reactions Heat inac.ivatton is for 20 min at 75°C. Unit assay conditions are 40 mM 
KPO, (PH7 5), 6.6 mM MgCb. 1 mM 2-mercaptoethanol, 20 uM dAT copolymer, 
33 uM dATP and 33 ,M 'H-dTTP. Storage conditions are 0.1 M KP0, (pH 6.5), 1 mM 
dithiothreitol, and 50% glycerol. Store at -20»C. 

DNA Polymerase I, Large (Klenow) Fragment. Klenow fragment is a 
proteolytic product of E. coli DNA Polymerase I that retains polymerization and 3'->5' 
exonuclease activity, but has lost 5^3' exonuclease activity. Klenow retatns the 
polymerization fidelity of the holcenzyme without degrading 5' termini. 

A genetic fusion of the E. col, polA gene, that has its 5'-»3' exonuclease domain 
genetically replaced by maltose btnding protein (MBP). Klenow Fragment is cleaved 
from the fusion and purified away from MBP. The resulting Klenow fragment has the 
identical amino and carboxy temuni as the conventionally prepared Klenow fragment. 
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Applications include DNA sequencing by the Sanger dideoxy method (Sanger 
etal. 1977), fill-in of 3' recessed ends (Sambrook etal, 1989), second-strand cDNA 
synthesis, random priming labelmg and second strand synthesis in mutagenesis protocols 
(Gubler, 1987). 

Reactions conditions are IX E. coli Polymerase I/Klenow Buffer (10 mM Tris- 
HC1 ( P H 7 5) 5 mM MgC12, 7.5 mM dithiothreitol). Supplement with dNTPs (not 
included). Klenow fragment is also 50% active in all four standard NEBuffers when 
supplemented with dNTPs. Heat inactivated by incubating at 75°C for 20 min. FUl-m 
conditions: DNA should be dissolved, at a concentration of 50 ^g/ml, in one of the four 
standard NEBuffers (IX) supplemented with 33 uM each dNTP. Add 1 unit Klenow per 
Hg DNA and incubate 15 min at 25°C. Stop reaction by adding EDTA to 10 mM final 
concentration and heating at 75°C for 10 min. Unit assay conditions 40 mM KP04 
(pH7.5), 6.6 mM MgC12, 1 mM 2-mercaptoethanol, 20 uM dAT copolymer, 33 uM 
dATP and 33 uM 3 H-dTTP. Storage conditions-aTe-0:r-M-KP0-(pM-675)-l-rnM- 
dithiothreitol, and 50% glycerol. Store at -20°C. 

Klenow Fragment (3'-*5' exo). Klenow Fragment (3'->5' exo-) is a proteolytic 
product of DNA Polymerase I which retains polymerase activity, but has a mutation 
which abolishes the 3'-»5' exonuclease activity and has lost the 5'-3' exonuclease 

(Derbyshire et al, 1988). 

A genetic fusion of the E. colipolA gene, that has its 3'->5' exonuclease domain 
genetically altered and 5'->3' exonuclease domain replaced by maltose binding protein 
(MBP). Klenow Fragment exo- is cleaved from the fusion and purified away from MBP. 
Applications include random priming labeling, DNA sequence by Sanger dideoxy 
method (Sanger et al, 1977), second strand cDNA synthesis and second strand synthesis 
in mutagenesis protocols (Gubler, 1987). 
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Reaction buffer is IX E. coli Polymerase VKlenow Buffer (10 mM Tns-HCl 
(P H7 5) SmMMgCb, 7.5^1 dithio.hrei.ol). Supplement with dNTPs. Klenow 
Fragm e„. exo- is also 50% active in all four standard NEBuffcrs when supplemented 
with dNTPs. Heat inac.iva.ed by incuba.ing a. 75°C for 20 mm. When using Klenow 
Fragment <3'-5' exo-) for sequencing DNA using the dideoxy method of Sanger e, ai 
(1977), an enzyme concentration of 1 unit/5 ul is recommended. 

Unit assay conditions are 40 mM KPO, (pH7.5), 6.6 mM MgCl 2 , 1 mM 
2-mercap«oe,hano>, 20 U M dAT copolymer, 33 »M dATP and 33 uM 3 H-dTTP. Storage 
conditions are 0.1 M KP0 ( (pH 7.5), 1 mM dithio.hrei.ol, and 50% glycerol. S.ore a. - 

20°C. 

T4 DNA Polymerase. T4 DNA Polymerase catalyzes the synthesis of DNA in 
the 5'->3' dnecnon and requires the presence of template and primer. Uri. enzyme has a 



, y-tf exonuclease activity which is much more active man that found~in~DNA 
Polymerase I. Unlike E col, DNA Polymerase I, T4 DNA Polymerase does not have a 

5'->3' exonuclease function. 

Purified from a strain of E. coli that carries a T4 DNA Polymerase overproducing 
20 plasmid. Applications include removing 3' overhangs to form b,un, ends (Tabor ft 
Struhl 1989; Sambrook elai, 1989), 5" overhang f.U-i» «o form blun, ends (Tabor ft 
StruM 1989; Sambrook e,al, 1989), smgle strand deletion subeloning (Dale e,al, 
,985) second strand synthesis in site-directed mutagenesis (Kunkel etal, 1987), and 
probe ,abehng using rep.acemen, synthes.s (Tabor ft Struhl, ,989; Sambrook etal, 
25 1989). 

The reaction buffer is IX T4 DNA Polymerase Buffer (50 mM NaCl, 10 mM 
Tris-HCl 10mMMgCl 2 , 1 mM dithiothreitol (pH 7.9 at 25»C)). Supplement wtth 
40 ug/ml BSA and dNTPs (not included in supplied 10X buffer). Incubate at temperature 

30 suggested for specific protocol. 
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It is recommended to use 100 uM of each dNTP, 1-3 units polymerase/^ DNA 
and incubation at 12°C for 20 mm in the above reaction buffer (Tabor & Struhl, 1989; 
Sambrook etal, 1989). Heat inactivated by incubating at 75°C for 10mm. T4 DNA 
5 Polymerase is active in all four standard NEBuffers when supplemented with dNTPs. 

Unit assay conditions are 50 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl 2 , 1 mM 
dithiothreitol (pH7.9 at 25°C), 33 uM dATP, dCTP and dGTP, 33 uM 3 H dTTP, 
70,g/ml denatured calf thymus DNA, and 170,g/ml BSA. Note: These are not 
10 suggested reaction conditions; refer to Reaction Buffer. Storage conditions are 100 mM 
KP0 4 ( P H 6.5), 10 mM 2-mercaptoethanol and 50% glycerol. Store at -20°C. 

3. RNA polymerases 

RNA polymerases for use in the present invention are exemplified as follows. 



15 
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T7 RNA Polymerase SP6 RNA Polymerase and T3 RNA Polymerase. 

Initiation of transcription with T7, SP6 RNA and T3 RNA Polymerase Polymerases is 
highly specific for the T7 and SP6 phage promoters, respectively. Cloning vectors have 
been developed which direct transcription from the T7 SP6 or T3 promoter through 
polylinker cloning sites (Schenborn & Meirendorf, 1985). These vectors allow in vara 
synthesis of defined RNA transcripts from a cloned DNA sequence. Under optimal 
conditions, greater than 700 moles of T7 RNA transcript can be synthesized per mole of 
DNA template (Noren etal., 1990). RNA produced using the SP6 and T7 RNA 
polymerases is biologically active as mRNA (Krieg & Melton, 1984) and can be 
accurately spliced (Green etal, 1983). Anti-sense RNA, produced by reversing the 
orientation of the cloned DNA insert, has been shown to specifically block mRNA 
translation in vivo (Melton, 1985). 

Labeled single-stranded RNA transcripts of high specific activity are simple to 
prepare with T7 and SP6 RNA polymerases (Sambrook et al, 1989). Increased levels of 
detection in nucleic acid hybridization reactions can also be obtained due to the greater 
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stabihty of RNaXnA hybrids with respect to RNA:RNA or DNA:DNA hybrids (Zitm 

etal, 1983). 

SP6 RNA Polymerase is isolated form SP6 phage-infected Salmonella 
typhimurium LT2Z (Butler & Chamberlin, 1982). T7 RNA Polymerase i. isolated from 
E coli BL21 carrying the plasmid P AR1219 which contains T7 gene I under the control 
of the inducible lac UV6 promoter (Davanloo etal., 1984). Applications include 
preparation of radiolabeled RNA probes (Sambrook et al, 1989), RNA generation for in 
vitro translation (Sambrook et al, 1989), RNA generation for studies of RNA structure, 
processing and catalysis (Sambrook et al, 1989) and expression control via antrsense 
RNA. 

Reaction IX RNA Polymerase Buffer: (40 mM Tris-HCl (pH 7.9), 6 mM MgCl 2 , 
2 mM spermidine, 10 mM dithiothreitol). Supplement with 0.5 mM each ATP, UTP, 
_ GTp _ CTp _ (no ^^ 

Incubate at 37°C (T7 RNA polymerase) or 40°C (SP6 RNA polymerase). 

Dithiothreitol is required for activity. Both enzymes are extremely sensitive to 
salt inhibition. For best results overall salt concentration should not exceed 50 mM. SP6 
RNA polymerase is 30% more active at 40°C than at 37°C. Higher yields of RNA may 
be obtained by raising NTP concentrations (up to 4mM each). Mg 2+ concentration 
should be raised to 4mM above the total NTP concentration. Additionally, inorganic 
pyrophosphatase should be added to a final concentration of 4 units/ml. SP6 RNA 
polymerase is supplied with a control template (NEB#207B). The template is a P SP64 
vector containing a 1 .38 kB insert, linearized at 3 different restriction sites. Transcription 
with SP6 RNA polymerase results in three runoff fragments of 1.38 kB, 0.55 kB and 
0.22 kB. 

Storage conditions are 100 mM NaCl, 50 mM Tris-HCl (pH 7.9), 1 mM EDTA, 
20 mM 2-mercaptoethanol, 0.1% Tnton-X-100 and 50% glycerol. Store at -20°C. 
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T3 RNA polymerase is a DNA dependent RNA polymerase which exhibits 
extremely high specificity for T3 promoter seance, The enzyme will incomes 
32P 35S and 3H-labeled nucleotide triphosphates. It is used in the synthesis of RNA 
transits for hybridization probes in vitro translation, RNase protection assays or RNA 

processing substrates. 

One unit of T3 RNA polymerase is defined as the amount of enzyme required to 
catalyze the incorporation of 5nmo! of CTP into acid insoluble product tn 60 mtnutes at 
37»C in a total volume of 100,1. The reaction conditions are as follows, 40mM Tns-HCl 
( P H 7.9), 6 mM MgC, ; , lOmM DTT, lOmM NaCl, 2mM spennidine, 0.5% Tween®-20 
0 5mM each ATP, OTP, DTP, and UTP, O.SuCi [ 3 H] CTP, and 2ug supercoiled P SP6/T3 
Vector DNA. Promega provide a T3 RNA polymerase extracted from recombman. E. 
coli. 



j ssalysis-ofSequence-Data/Bioinfor-matics 

The sequences generated using GLGI can be used to match gene databases {e.g., 
GenBank, EMBL, DDBJ, UmGene Human Database). Each sequence will be identified 
as a known gene, EST sequence, or novel sequences without matches. There are many 
blol nformatic tools used for gene prediction in genomic DNA, for example, GenScan 
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K Protein Purification 

,„ context of the present invention it will be durable to isolate and punfy 
proteins. Pro.etn purification techniques are well known to those of skill ,n the art. 
These techniques involve, at one level, the crude ftac.iona.ion of the cellular m.heu to 
po,ypep.ide and non-polypep.ide fraCions. Havtng separated the po ly pept,de from .other 
proteins, the polypeptide of interest may be further purified using chromatography and 
e.ectrophoretic techmques to achieve partia, or complete purification (or punficatton o 
homogeneity). Analytics, methods particularly suited to the preparation of a pure peptide 
are ion-exchange chromatography, exclusion chromatography; polyacrylam.de gel 
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electrophorests; isoe.eCric focusmg. A parttcularly efficient method of purifying 
peptides is fast protein liquid chromatography or even HPLC. 

Certain aspects of the present invention concern the purification, and in parttcular 
embodiments, the substantial purification, of an encoded protein or peptide. The term 
"purified protein or peptide" as used herein, is intended to refer to a composttton, 
isolatable from other components, wherein the protein or pepttde is purified to any degree 
relative to its naturaUy-obtainable state. A purified protein or peptide therefore also 
refers to a protein or pept.de, free from the environment in which ,. may naturally occur. 

Generally, "purified" will refer to a protein or peptide composition that has been 
subjected to fractionation to remove various other components, and which composttton 
substantially retains its expressed biological activity. Where the term "substanttaHy 
purified" is used, this designation will refer to a composition in which the protem or 
^, ia «oms-,he^ 
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pC^UUt. lull"" * j i . 

about 60%, about 70%, about 80%, about 90%, about 95% or more of the pro.ems m the 

composition. 

Various methods for quantifying the degree of purification of the protein or 
peptide will be known ,0 those of skill in the art in light of the present disclosure. These 
include, for example, determining the specific activity of an active fraction, or assesstng 
the amount of polypeptides within a fraction by SDS/PAGE analysis. A preferred 
method for assessing the purity of a fraction is ,0 calculate the specific activtty of the 
fraction, to compare it to the specific activity of the initial extract, and to thus calculate 
the degree of purity, herein assessed by a "-fold purification number." The actual umts 
„sed to represent the amount of activity will, of course, be dependent upon the parttcular 
assay technique chosen to follow the purification and whether or no, the expressed 
protein or peptide exhibits a detectable activity. 

Various techniques suitable for use in protein purification will be well known to 
those of sktll in the art. These include, for example, precipitation with ammontum 
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sulphate, PEG, a„«,bod,es and the like or by hea, denotation, followed by 
centrifugation; chromatography steps such a, km exchange, ge. filtration, reverse phase, 
hydroxylase and affinity chromatography; isoelectric focusing; ge, electrophores.s; 
a „d combinations of such and Cher techniques. As is general* known in the art, ,t ,s 
believed that the order of conducting the va^ous pnrification steps may be changed, or 
ma, certain steps may he omitted, and still result in a suitable method for the preparatton 
of a substantially purified protein or peptide. 

There is no general requirement that the protein or peptide always be provided ,„ 
meir most purified state. Meed, it is contemplated that less substantially purified 
products will have uti.tty in certatn embodiments. Partial purification may be 
accomplished by using fewer purification steps in combination, or by utiuztng afferent 
forms of the same genera, purification scheme. For example, i, is apprec.ated that 
cation-exchange column chromatography performed utihzing an HPLC apparatus wt.l 

n^iiy^uin,™^ 

pressure chromatography system. Methods exhibiting a ,ower degree of relative 
punfication may have advantages m total recovery of protein product, or in rn—ng 

the activity of an expressed protein. 

„ is known that the migration of a polypeptide can vary, sometimes significantly, 
with different conditions of SDS/PAGE (Capaldi - ai, ,977). It will therefore be 
appreciated that under differing electrophorests condttions, the apparent molecular 
weights of purified or partially purified expression products may vary. 

High Performance Liquid Chromatography (HPLC) and FPLC are characterized 
by a very rapid separation with extraordinary resolution of peak, This is achieved by the 
use of very fine particles and high pressure to maintain an adequate flow rate. Separation 
can be accomphshed in a matter of minutes, or a. most an hour. Moreover, only a very 
small vo,ume of the samp,e is needed because the particles are so smal, and close-packed 
tha , the void volume ,s a very smal. fraction of the bed volume. Also, the concentration 
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of the sample need no, be very great because the bands are so narrow that there is very 

little dilution of the sample. 

Gel chromatography, or molecular sieve chromatography, is a special type of 
partttton chromatography that ts baseu on molecular size. The theory behind gel 
chromatography is that the column, which is prepared with tiny particles of an mert 
substance that contain small pores, separates larger molecules from smaller molecules as 
they pass through or around the pores, depending on their size. As long as the matenal of 
which the particles are made does no, adsorb the molecules, the sole factor determmmg 
rate of flow is the size. Hence, molecules are eluted from the column in decreasmg srze, 
so long as the shape is relatively constant. Gel chromatography is unsurpassed for 
separating molecules of different size because separation is independent of all other 
factors such as pH, iomc strength, temperature, etc. There also is virtually no adsorpfon, 
less zone spreadmg and the elutton volume is related in a simple matter ,0 molecular 
~" weight 

Affinity Chromatography is a chromatographic procedure that relies on the 
specific affimty between a substance to be iso.atcd and a molecu.e tha, it can specially 
bind to This is a receptor-hgand type intetaetion. The column material is synthestzed by 
, cova.en.ly couphng one of the binding partners to an insoluble matrix. The column 
nratcrial is .hen able .o specifically adsorb .he subs.a»ce from .he solution. Elutton 
occurs by changing the conditions .0 those in which binding will no, occur (al.er pH, 
ionic strength, temperature, etc.). 

25 A particular ,ype of affinity chromatography useful in the purification of 

carbohydrate containing compounds is lectin affimty chromatography. Lectms are a class 
of substances that bind .0 a variety of polysaccharides and glycoproteins. Lectms are 
usually coupled ,0 agarose by cyanogen bromide. Conconavalin A coupled ,0 Sepharose 
was ,he first ma.eria. of this sort ,0 be used and has been widdy used in me isolatton of 

30 polysaccharides and glycoproteins Cher lectins that have been include lentil lecttn, wheat 
germ agglutinin which has been useful in me purification of N-acety, glucosammy. 
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residues and He/a pom**, lectin. Lectins themselves are purified using affinity 
chromatography with carbohydrate ligands. Lactose has been used to purify lectms from 
castor bean and peanuts; maltose has been useful in extracting lectins from lentils and 
jack bean; N-acetyl-D galactosamine is used for purifying lectins from soybean; N-acetyl 
glucosamine binds to lectins from wheat germ; D-galactosamine has been used m 
obtaining lectins from clams and L-fucose will bind to lectins from lotus. 

The matrix should be a substance that itself does not adsorb molecules to any 
significant extent and that has a broad range of chemical, physical and thermal stability. 
The Ugand should be coupled in such a way as to not affect its binding properties. The 
ligand should also provide relatively tight binding. And it should be possible to elute the 
substance without destroying the sample or the ligand. One of the most common forms 
of affinity chromatography is immunoaffmity chromatography. The generation of 
antibodies that would be suitable for use in accord with the present invention is discussed 



T5 below: 
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L. Sequencing Proteins 

Protein sequencing may be carried out by techniques well known in the art such 
as those involving the sequential removal of amino acids from one end of the protein and 
identifying each removed amino acid in turn (Edman's Degradation). Other techniques 
to obtain amino acid sequence information use mass spectrometry, typically using fast 
atom bombardment to ionize the sample. In fast atom bombardment, a sample dissolved 
in a liquid is bombarded with atoms or ions. Charged molecules resulting from this 
process are directed into the spectrometer and detected. An example of this technique is 
?5 described in the text entitled "Macro Molecular Sequencing and Synthesis Selected 
Methods and Applications", 1988, published by Alan R. Liss, Inc., specifically at pages 
83 to 99 in an article in such text entitled "Mass Spectrometry in Bio-Pharmaceutical 
Research" by Steven A. Carr et al. 1988, Several modifications of these techniques are 
well known to the skilled artisan and any of the techniques used for protein sequencing 
30 may be used in context of the present invention. 
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Typically protein sequencing methods involve digesting the large protein 
m0 ,ecu,e into smaller fragments. These fragments are then separated or purified and then 

subject to the sequencing method. 



■M 



5 l. Digesting Proteins 

Digesting purifred and/or isolated protein molecules to obtain smaller fragments 
can be carried out using proteolytic enzymes, known as proteases, to obtain a vanety of 
N-terminal, C-termina, and interna, fragments. Some of the well known proteases 
include trypsin, chymotyrpsin, elastase, co.lagenase, leupeptin, and endoprotemase, 
,0 Other protem Resting enzymes are a.s„ present and may be used in this invention and 
are we,, known to one of ordinary skill in the art and. Examp.es of frapnents may 
include contiguous residues of the protein sequence 6, 7, 8, 9, 10, 11, 12, 13, ,4, ,5, 16, 
,7, 1 8, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 75, 80, 85, 90, 95, 100, or 
more amino acids in length. 
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2 Seperating Protein Fragments 

These digested protein fragments may be separated or further punfied according 
,o known methods, such as prestation e.g. ammonium sulfate precipitation; HPLC; ,on 
exchange chromatography; affinity chromatography (including immunoaffimty 
chromatography); and/or various size separations such as sedimentation gel 
electrophoresis (SDS-PAGE), gel filtration or molecular sieve chromatography. All these 
methods are described above in detail. 

H,gh Performance Liquid Chromatography (HPLC) and FPLC are preferred 
methods since they prov.de very rapd separation with extraordinary resolution of peak. 
Separate can be accomplished in a matter of minutes, or at most an hour and 
furthermore only a very small volume of the sample is needed. Also, the concentration of 
the sample need not be very great because the bands are so narrow that there is very httle 
dilution of the sample. This is ideal for digested protein fragments. 



30 
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M Obtaining Nucleic Acid Sequences from Protein Sequences 

The protem fragment sequences obtained above can then be used to obtain nucletc 
ac,d sequences by techniques we,l known to one of skill in the art. The techniques 
include artificial synthesis of nucieic acid polymers. Table 2 below describes the 
degeneracy of codons and provides the corresponds amino acid sequences. As known 
,o the skilled artisan, one can use the codon preference or bias of an orgamsm if known. 



Amino Acids 



Alanine 

Cysteine 

Aspartic acid 

Glutamic acid 

Phenylalanine 

Glycine 

Histidine 

Isoleucine 

Lysine 

Leucine 

Methionine 

Asparagine 

Proline 

Glutamine 

Arginine 

Serine 

Threonine 

Valine 

Tryptophan 

Tyrosine 



Ala 


A 


Cys 


C 


Asp 


D 


Glu 


E 


Phe 


F 


Gly 


G 


His 


H 


He 


I 


Lys 


K 


Leu 


L 


Met 


M 


Asn 


N 


Pro 


P 


Gin 


Q 


Arg 


R 


Ser 


S 


Thr 


T 


Val 


V 


Trp 


W 


Tyr 


Y 



TABLE 2 
I Codons 



GCA GCC 
UGC UGU 
GAC GAU 
GAA GAG 
UUC UUU 
GGA GGC 
CAC CAU 
AUA AUC 
"AAA - AAG" 
UUA UUG 
AUG 

AAC AAU 
CCA CCC 
CAA CAG 
AGA AGG 
AGC AGU 
ACA ACC 
GUA GUC 
UGG 

UAC UAU 



GCG GCU 



GGG GGU 
AUU 



CUA CUC CUG CUU 



CCG CCU 

CGA CGC CGG CGU 
UCA UCC UCG UCU 
ACG ACU 
GUG GUU 



The nucleotide, generated in the present invention include those encoding the 
isolated and purified proteins fragments as described above. It will also be understood 
that nucleic acid sequences (and their encoded amino acid sequences) may mclude 
add.tional residues, such as additional 5' or 3' sequences. 
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N. Examples 

The following examples are included to demonstrate preferred embodiments of 
the invention. It should be appreciated by those of skill in the art that the techniques 
disclosed in the examples which follow represent techniques discovered by the inventor 
to function well in the practice of the invention, and thus can be considered to constitute 
preferred modes for its practice. However, those of skill in the art should, in light of the 
present disclosure, appreciate that many changes can be made in the specific 
embodiments which are disclosed and still obtain a like or similar result without 
departing from the spirit and scope of the invention. 

EXAMPLE 1 
Materials and Methods 

0 §AGE Xags . A groU p of SAGE tags 10 bases long were selected from the SAGE 

|— !- 5 tag-sequences-4atabase^^ cells of normal colon (Zhang et al, 

1997) (http://www.ncbi.nlm.nih.gov/SAGE/sagerec.cgi?rec=166). Each selected SAGE 
tag sequence was searched in the UniGene database 
(http://www.ncbi.nlm.nih.gov/SAGE/SAGEtag.cgi7tag) to identify it as a matched or an 
unmatched tag sequence. Each matched sequence was given the appropriate Unigene ID 
20 number. Both matched and unmatched tags were used in the experiments. 

RNA samples and cDNA synthesis. The same RNA sample from epithelium 
cells of normal human colon tissue was used for this experiment (Zhang et al, 1997). 
RNA samples from 24 different human tissues were also used for the detection of 
25 multiple expression (CloneTech). First strand cDNAs were generated through oligo-dT 
priming with a cDNA synthesis kit (Life Technologies), following the manufacturer's 
instruction. After cDNA synthesis, the excess free oligo-dT primers were removed using 
a MicroSpin S-300 column (Amersham Pharmacia). 



30 



PCR conditions. Pfu DNA polymerase (Stratagene) was used with lOx buffer 
(200 mM Tris-HCl pH 8.8, 100 mM KC1, 100 mM (NH 4 ) 2 S0 4 , 20 mM MgS0 4 , 1% 
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Triton X-100, 1 mg/ml BSA). Two mM MgCl, was added in each react™ to rncrease 
the Mg" concentration. The PCR mixture contained 1 x buffer, 2 mM MgCh, 0.3 mM 
dNTPs 0.04 unit/ul Pfu polymerase, 3 ng/ul sense primer, 1.5 ng/ul anchored oltgo-dT 
primer (stngie or mixture) in final volume of 20 or 50 ul. The PCR reactions were 
performed firs, a, 94X 1 min, followed by 5 cycles at 94°C 20 sec, 50 to 53°C 20 sec, 
72»C 20 sec. The conditions were then changed to 25 cycles at 94°C 20 sec, 60*C 20 sec, 
and 72°C 20 sec. The reactions were kept at 72°C for five minutes for the last cycle. 

DNA cloning and sequencing. PCR amplified fragments were cloned into pCR- 
Blun, vector (Inv.troOen). Posi.tve clones were screened using PCR with M13 reverse 
and M 1 3 forward (-20) primers located in the vector, or using EdU digestion. Plasmtds 
were prepared with a plasmid purification kit (Qiagen). Sequencing reactions were 
performed with PE bi g-dye kit (PE Applied Biosyste ms) wit h M13 reverse pnmer, 
following the manufacturer's instruction. 

Database search. All the sequences generated from the clones were searched 
using the BLAST program for alignment (http://www.ncbi.nlm.nih.gov/BLAST/). 



EXAMPLE 2 
Results and Discussion 



The inventors envisioned that the amplification of a particular template 
corresponding to a particular SAGE tag will proceed as depicted in the schematic in FIG. 
, using a combination of a sense primer containing a SAGE tag sequence and a smgle- 
base anchored oligo-dT antisense pnmer. In this process, only the cDNA templates 
containing the binding sequences for the SAGE tag will be annealed and extended m the 
first PCR cycle. In the second cycle, the extension will only happen from that smgle- 
base anchored oligo-dT primer which anneals a. the 5' end of the poly-dA sequences wrth 
the anchored-nucleottde correctly paired to the las. nucleotide before the poly-dA 
sequence. Extension of all other anchored primers annealed along the poly-dA sequences 
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will be blocked because of presence of the anchor nuc.eo.ide. The resulting extended 
templates will exclude poly-dA/dT sequences. Only the cDNA templates containing the 
SAGE tag sequence will undergo exponential amplification in the following PCR cycles. 
Thus, only copies of the same size will be generated. 

» 

The expected size distribution of amplified sequences using this strategy should 
be up to several hundred bases, because of the use of NlalU digestion in the SAGE 
process for SAGE tag collection (Velculescu et al, 1995). Atom is a restriction enzyme 
recognizing CATG. As shown in FIG. 2, the size distribution of M«1II digested cDNA 
5 was centered between 200 to 500 base pairs. 

Design of primer. Each SAGE tag contains only a 10 base sequence. To 
increase the length of the pnmers for efficient PCR pnmtng, CATG, a M«III recogmtton 
site used for collecting SAGE Tag fragments (Velculescu et al, 1995), was added 5' of 

^_ s _ AGE - gg _^^^ was added y of the primer to 

mcrease the primer size and to provide a potential site for subcloning. For the anchored 
oligo-dT primers, a single-base anchor dA, dG, or dC was attached to the 3' end of the 
o,igo-dT primer (Khan et al, 1991, Kiriangkum et al, 1992; Liang and Pardee, 1992, 
Ltang et al, 1994; Wang and Rowley, 1998). To determine tire best length of ohgo-dT 

20 sequences, different numbers of dT nucleotides from U to 20 were tested, wrth dTll 
giving the best results. 

Optimizing PCR condition. Various PCR conditions were tested in order to 
maximize the specificity and efficiency of amplification. In the PCR reactton, the 
25 anchored primers were either combined separately with each sense primer, or a mtxture 
of equal amounts of dA, dG and dC anchored primers was used with the sense pnmer. 
Pfu DNA polymerase was chosen for the PCR amplification because it showed greater 
fidelity of amphfication compared with regular Taq DNA polymerase (Lundberg et al, 
,991) (data no. show). The Mg + * concentration played an important role in determmmg 
30 the specificity and the yte.d of the PCR products. Satisfactory results were usually 
obtained a. the final concentration of 4 mM Mg". The number of PCR cycles ts 
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important to maintain the specificity of the amplification. Over-amplification with a high 
number of PCR cycles could result in non-specific amplification. 

Amplification of longer sequences from SAGE tags. A group of SAGE tags 
generated from colon tissues was selected for the analysis (Zhang et al., 1997) (Table 3). 
PCR™ was performed with each sense primer containing the SAGE tag sequence and 
individual or mixed anchored oligo-dT primers, combined with cDNAs from colon tissue 
generated by oligo-dT priming. The PCR products were electrophoresed through an 
agarose gel, and cloned into vector for sequencing analysis. FIG. 3 shows examples of 
the PCR amplification with three SAGE tags that matched to known sequences. The last 
nucleotide before the poly-dA sequences for those three sequences (Hs.184776, Hs.3463 
and Hs. 11 8786) is dT, dC, and dG respectively. The inventors obtained the expected 
results. The amplification occurred only in the reaction with dA, dG and dC anchored 
oligo-dT for these three sequences. When the dA, dG and dC anchored oligo-dT primers 

were-m^^^^ P roduCtS Ca " be genmted ^ 

though the amplification efficiency was lower due to the competition of binding between 
these three primers. These data indicate that the reaction can be simplified into a single 
reaction using a combination of dA, dG and dC anchored oligo-dT primers. Table 3 
summarizes the results generated from these experiments. For the matched SAGE tag 
sequences, amplification occurred when the correct anchor primers were used except for 
Hs.194659, which was amplified by dG anchored oligo-dT but the matched sequences 
ended with dT. The size distribution of these amplified fragments ranged from 77 to 382 
base pairs. cDNA fragments were also generated from three unmatched SAGE tags, and 
they represent novel sequences. 

Identify the correct sequence from multiple sequences that matched with the 
same SAGE Tag. When matching SAGE tag sequences in databases, a single SAGE tag 
may align with several sequences. For example, nine out of 40 SAGE tag sequences 
show matches to multiple Unigene Clusters (Zhang et al, 1997). Other than sharing the 
same SAGE tag sequence, these matched sequences have no homology and are derived 
from various different tissues. To test this issue experimentally, 12 SAGE tags were used 
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fOT a.phficatton wtth cDNA sa.np.es from 24 afferent ta.- • 
n tags generated muMple temp.ates. Fo, exatnpie, the SAGE tag (GTCATCACCA) 
g e„er J five different sciences from five different tissues (fe«a> Hver, sMeta, muscie 
p in ai cord, trachea and coion), and two different sciences from the same t.ssue sptna 
1> (Tab,e 4). A„ of these fragments contained the same SAGE tag science, bu, he 
the sciences showed no homoiogy. Among these seances, the ones from co on 
L ah maid the previous amphfied seances in the co.on (Tabfe ~ 
ind ,cate that a SAGE tag itseif may not be sufficient to serve as a ume,ue tdenhfter for 
particuiar seauence, when severa, sciences share the same SAGE tag seances, ft 
mportan, to distmguish which one of the matched se,uences ,s the correct s = 
colponding to the particuiar SAGE tag. To avord the uncertamty when dtf er « 
sciences are expressed from different tissues, it win be necessary to generate 
ZTfrom L same trssue used to generate the SAGE ,a, The — 
0D ervations aiso indicate that reiying on,y on a database search to identtfy the se,uen 
orresponding to a SXGE faymay~provide-misleading-infoxmat.i on - Dtrec, ampUficafon 
IL specific temp.ate w„h the inventors strategy win he very usefu. for —on of 
the validity of a particular SAGE tag. 
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Table 3. Summary of GLGI results from SAGE Tags 



SAGE Tags 
(10 base) 



Unigene 
ID 



3' end 
nucleotide 
in matched 
sequences* 



Amplified by 
anchored oligo 
dT 



Length 

of 
sequen 
ce (bs) 



Match to 
original 
sequence 



GGAAGGTTTA 
AGATCCCAAG 
CTTATGGTCC 
AGGATGGTCC 
GTCATCACCA 
GACCAGTGGC 
CTGTTGGTGA 
ACTGGGTCTA 
TACGGTGTGG 
CGGTGGGACC 
CCTTCAAATC 
GGAGGCGCTC 
AAGAAGATAG 
GATCCCAACT 
GAACAGCTCA 
AGGTGACTGG 
CACCTAGTTG 
CCTGTCTGCC- 



Hs. 105484 

Hs.50813 

Hs.179608 

Hs.71779 

Hs.32966 

Hs.143131 

Hs.3463 

Hs.227823 

Hs.105460 

Hs.99175 

Hs.23118 

Hs.33455 

Hs,73848 

Hs.l 18786 

Hs.194659 



dT/dG 
dC/dG 
dT 
dC 
dC 
dC/dT 
dC 
dG 
dC 
dC/dT/dG 
dC/dT 
dT/dG 
dT 
dG/dT/dC 
dT 



dT 

dC 

dT 

dC 

dC 

dC 

dC 

dG 

dC 

dC 

dC 

dT 

dT 

dG 

dG 

dC 

dT 

dT 



77 

84 

86 

112 

119 

135 

148 

150 

166 

200 

220 

238 

317 

329 

382 

156 

170 

249 



+ 
+ 
+ 
+ 
+ 
+ 
+ 

+ 
+ 
+ 
+ 
+ 
+ 
+ 



*The 3' end nucleotides from all the sequences were included in each matched Unigene 

-The amplified sequences were matched to databases again. The last three sequences 
have no matches and represent novel sequences. 

During the course of the research, the inventors became aware of a report 
describing a method RAST-PCR (Rapid RT-PCR Analysis of Unknown SAGE Tags) for 
analyzing unknown SAGE Tags (van den Berg et al, 1999). The authors used a sense 
pnmer that was designed based on a SAGE tag. However, the antisense primer was the 
M13 sequence tai.ed to 5' oUgo-dT 24 used for cDNA synthesis. In the process of cDNA 
synthesis, oligo-dT primer, anneal randomly along the poly-A sequences m the mRNA 
template. The resulting cDNAs include various lengths of poly-dA/dT sequences at the 
3' of the cDNA, even from the same mRNA template. Using the M13 sequence toiled to 
the oligo-dT as the antisense pnmer for PCR will generate multiple fragments with 
different sizes or a smear due to the inclusion of different length of poly-dA sequences. 
Using the conditions described in that paper (Van den Berg, 1999), the inventors obtained 
the results the inventors expected, namely smears (FIG. 4). 
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Table 4. Detection of heterogeneous sequences in various tissues 

containing the same SAGE lag 



SAGE TAG 



Positive tissues 



Tt^^ TD^ length of sequence^ 



CGGTGGGACC 



AGATCCCAAG 



CTTATGGTCC 



GTCATCACCA 



Colon, Thymus, Small intestine 

Small intestine 

Thymus 

Colon, Heart, Placenta, Thymus 

Placenta 

Skeletal muscle 

Testis 

Thymus, Placenta 

Bone marrow 
Bone marrow 
Colon 

Fetal liver, Spinal cord 
Skeletal muscle 
Spinal cord 
Trachea 
coton 



Hs.99175 
no match 
no match 

Hs.50813 
no match 
Hs.85937 
no match 
no match 

Hs.237416 
no match 
Hs.179608 

Hs.222346 
Hs.1288 
Hs.9641 
no match 
_Hs.32966_ 



200 
368 
90 

84 

53 

282 

227 

51 

393 
144 
86 

125 
399 
394 
225 
136 



The development of the GLGI method provides several potential applieattons. 
Fir st, « Provides a strategy for even wtder application of the SAGE technioue for 
optative analysis of global gene expression. Seeond, it can be used to ,den«* the 3 
cDNA sequence from any exon within a gene. These exons can mclude ft. on 
pr edic.ed by biomformatie too,. Thtrd, a combined app.ication of SAGE/GLGI can be 
apphed to define the 3' boundary of expressed genes in the genomic sequences rn human 
and in other eukaryotic genomes. 



EXAMPLE 3 
High-throughput GLGI 

A high-throughput GLGI procedure is a!so developed by the present inventors for 
converting a large set of SAGE tag sequences into gene identities. 
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Materia.* a„d Method, SAGE tags were selected from the SAGE tag 
seq „el generated from human and mouse myelo.d ceUs, inching 203 SAGE »gs 

sing ,e match was used as controls to demonstrate the specify of GLGI amphncatton. 

The same RNA samples from human and mouse myeloid eells used for SAGE 
a „a„sis were used as the temp.ates for GLGI amplification. mKNAs from 5 ug of tota 
R NA of each sample were isolated w,,h OHgo (dT) 25 Dynabeads (Dynal), fol.owmg the 
manufactured protocol. Po,y(dA/dT) cDNAs were synthestzed using a cDNA synthests 
k,t (Cat No: 18267-021, Life Technologies) and .he 5- btotinylated, 3' anchored o, go 
1 ) prtmers were used for first strand cDNA synthesis (5' 
1" TA P GloCGOCCGC-T,a-A,G, CA.CG and CO) (Wang - 2000, The douh. 

streptavidin beads (Oyn a,), following the manufactures protocol. In order ^ ^ner* 
mough , COMAS for ^ ^-^^ 



enoueh 3' cDNAs tor ului onmp^, - — * . 

Lig: SAGE .inker A or B was hgated ,0 the , cDNAs hound to the he.s ^ 
A . 5 . TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG and 
5.' pTCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC [amino mod. C7]- 3'; 
or Linker B: 5'- TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGA " 
3. and 5'- pTCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAG [amino mod. C7]- 
3 , ( „ np ,/www.sagene,.org/sage J ro,oco,. h .m). The hgated 3' cDNAs were then 
Rifled by 20 cycles of PGR a, for 30 s, 55'C for 30 s, and 72'C for 30 s, w.th 
PLATINUM Taq polymerase (Life Technologies), SAGE sense pnmer (5 - 
GGATTTGCTGGTGCAG TACA - 3' for linker A; or - 

v for linker t>A 

CTGCTCGAATTCAAGCTTCT - 3 

h„p://v™w,agenet.org/sage J ,ro.oco,.h,m) and antisense pnmer (5 
ACTATCT AGAGCGGCCGCTT- 3', located in the 5' end of anchored oligo dT prtmers 
used for the ftrst strand cDNA synthesis. The amplified temp.ates were exacted by 
phenoUchloroform, prectpitated by e,han„VNH 4 OAc/ g ,ycogen, and resuspended m 

0 buffer for GLGI amplification. 
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The sense primer used for GLGI amplif.ea.ion included 14 bases (CATG + 10 
base SAGE .ag sequence) a. .he 3' end and 6 bases (GGATCC, BamH I sites) a. the 5' of 
,he primer, grving a to.*! of 20 bases for each pnmer: 5'- 
GGATCCCATGXXXXXXXXXX -3' (Chen e, a,., 2000). Sense primers were 
syn.hes.zed in 96 well forma, and .he concentration was adjusted .0 50ng/u, w„h TE 
GLGI master mixtures were prepared for eaeh reaction, containing Ix PCR buffer (20 
mM TrisCl pH 8.4, 50 mM KCI), 2 mM MgCh, 0.2 mM dNTPs, 1.5 uni.s / 0.3 ,1 
PLATINUM Ta, polymerase, 60 ng / 1.2 pi antisense pnmer (5 - 
ACTATCTAGAGCGGCCGCTT-3'), and 0.5 - 5 ng of 3' cDNAs. The reaction mixtures 
were a.iquo.ed in.o a 96-well plate a. 28.8 ,1 per well. Sense primers (60 ng / 1.2 ,1) 
W ere men added into each wel.. GLGI reactions were performed in PE GeneAmp PCR 
Systems 9600 or 9700. The conditions used were 94-C for 2 min, followed by five 
cycles a, 9 4°C for 30 s. 5S°C for 30 s, and 72°C for 30 s. The conditions were t hen 
changed to 20-25 cycles a, 94°C for 30 s, 60"C for 30 s, and 72°C for 30 , Reactions 
were Kept a. 72-C for 5 min for .he las. cycle. The ampUfied products were direc.ly 
precipi.a,ed in .he 96-wel. PCR plate by adding 100,1 of prec.p.tation mixture to each 
we,,, containing ,,, of g,ycogen (20 mg/m,, Roche), !5„ of 7.5M NftOAc and 84,1 o 
,00% ethanol. The plate was sealed with Tape pads (QIAGEN, toe), vortexed, and kept 
a. room temperature for 15 min. After spinning a. 4000 rpm for 35 mm at 4»C 
(SORVALL RC5C plus; rotor: SH3000), the supematants were removed, 150,1 of 70% 
emanol were added per well .0 wash the DNA, and the plate were spun a. 4000 rpm for 
,5 minutes. The supematan.s were removed again, the pallets were air-dned, and 
disso,ved in 5,1 of dHA Two ,1 of DNA, 0.7 „ of sal. solution, 0.7 ,1 of water, and 6 
„ e of PCR4-TOPO vector were used for eaeh ligation reaction with TOPO TA cloning fat 
for sequencing (Invtiogen). The Hgafon reactions were performed at room temperature 
for 25 min. For transformation, 2 ,1 of ligation were mixed with 50 ,1 of TOPO10 
competent ee„s (Invitiogen), kept on ice for 20 min, then heated a. 42»C for 30 s, and 
moved on ice. SOC med.a (250 ,1) were added per well. Plate was sealed, shaken a. 37«C 
for 60 min a. 225 rpm. The .ransformants were spread on LB plates containing 50 ng/ml 
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of kanamycin and grew over nigh, a, 37°C. Positive clones were screened by direct 
col„„y-PCR PGR master mixtures were prepared, containing Ix PGR buffer (10 mM 
TrisC. P H 8.3, 50 mM KC1, 1.5 mM MgCk), 0.1 mM dNTPs, 0.5 units / 0.1 p. Taq 
p„, ym erase (TaKaRa), 60 ng of sense primer (M13 reverse primer) and 60 ng of 
antisense primer (M.3 forward (-20) primer). The reaction mixtures were aliquoted into 
a 96-we,l plate a, 25,1 per we.., and co,o„ies were picked into the reaction mixtures with 
s ,en,e pipette tips. PCR was performed m PE GeneAmp PCR Systems 9600 or 9700. 
The cond„ions used were 94'C for 2 min, fo.lowed by 25 cycles at 94°C for 30 s, 55 C 
for 30 s and 72°C for 60 s. The reactions were kept a, 72°C for 5 min after the las, 
™,e 75,.! of precipitation mixture were added per well to precipitate DNAs, contatmng 
22 ul of dHA 15m of 2M NaC.C, and 38 ,1 of 2-propa„ol. The plate v,as sealed, 

fx, <™in After sDinnine at 4000 rpm for 35 nun 
vortexed, and kept at room temperature for 5 mm. After spinning 

at 4°C the supernatants were removed, 150,1 of 70% ethanol were added per well to 
wash the DNA, and the plate were spun a, 4OO0 rpm for 25 minute s. Supernatants were 



removed again, the pallets were air-dried, and dissolved in !0,1 SnB&T****** 
matures were prepared in a total volume of 7m , containing 0.8,1 of big-dye pre- 
mature 1 4,1 of dilution buffer (400 mM TrisCl pH 9.0, 10 mM MgCl 2 ), 30 ng / 0.3 ,1 
of sequence primer (M13 reverse primer or M13 forward (-20) primer), 1.5,1 H20, and 
3,1 of DNA templates. Sequencing reactions were performed at 96°C for 10 s, 50°C for 5 
s and 60°C for 4 min for 99 cycles. The final sequencing products were precipitated by 
adding 75^ of precipitation mixture, consisting of 64,1 of 100% ethanol/3M NaOAc 
mixture (25:1), 1,1 of glycogen (20 mg/ml, and 10,1 dHA The plate was sealed 
vortexed, and kept a, room temperature for 15 min. After spinning at 4000 rpm for 35 
min a. 4'C, the supernatants were removed, 1 50,1 of 70% ethanol were added per well to 
wash the DNA, and the piate were spun a, 4000 rpm for 15 minutes. The supernatants 
were removed, the pallets were air-dried, and dissolved in 3,1 of loading dye. One pi 
W as .oaded in 5% sequencing gels. Four to six Cones were sequenced for hrgher 
abundant SAGE tags, and 8 to 12 clones were sequenced for low abundant SAGE tags. 
Sequences were collected with an ABI 377 sequencer. 
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A1 , colleclsequences were matched .0 GenBan* Database (NR and ESTs, 
^^^^^^^^j^^^^ov/BLAST/) through BLAST. My — between £ 
^l^^r^GT amp.if-ca.ion and the SAGE tag sequence 0 th 
It! !e,ence in database was cohered as non-specific amplifi^ a = 
fences were eUminated from further analysis. The matched sequence ID was used 
search UniGene database to obtain the UniGene cluster ID. 

R es„„ S and Discussion. The details of the bigh-throughput GLGI method are 
outlm ed ,„ PIG, and FIG. , Double-strand po,y(dA/dTy cDNAs « 
dig es,ed with Nlam. The 3' fragments are recovered w,tb streptav.dtn co ted b d ; 
Large q uan,i, y of , cDNAs templates can be generated by PCR — ^ 
cDNAs GLGI amplification are performed. Then, 3- cDNA fragments corresp ndmg .0 
Z specific SAGE tag are generated, cloned and sequenced. Al, the procedures ar 
^ dl n 96 forma„ofac,l 1 ,a,e 1 a, g e-sca,e analyses. All the reagents used herem are 
optimized to guarantee the result and minimwexpenses. 



0 



The high-throughput GLGI procedure has several differences as compared ,0 the 
GLGI, for example, (i, V cDNAs mstead of full-length cDNAs are used as the tempos 
I GLGI amplification. This prevents artificial amplification from non^ecfic 
tor ulaji <u y f t u P iactrATG The 3' cDNAs can 

annealing of sense primer .0 sequences upstream of the last CATG. T 

he amplified to provide suffice, templates for GLGI amp . cat, > ^ 
antrsense primer (5'-ACTATCTAGAGCGGCCGCTT-3') ,s used for all GLGI reac. ons 

disease pnmer is located in 3' end of a,, the cDNA templates .ncorporated from 
Ilored Iligo dT pnmers used for the fir, strand cDNA ^J^^ 
observed that .he anchored oligo dT primers are uns,able whrch can h.nder ,he succ^u 
formance of GLGI. Use of .he smgle primer a.so increased .he efficency of GLGI 
Z ^a,o„ sign.fican.ly as an, annealnrg of u,.s pnmer wHh T end sequence resul, m 
Z ndunn PCR. In con,ras, mc use of five anchored ol.go dT pnmers resu,.s m an 

I n by PCR on,y when correctly paned primers anneal. Thrs feature , pa— 
s i ,0 amplify the templates w„h low cop,s ; (in) PLATINUM Taq polymerase 
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instead of Pfu DNA polymerase was used for GLGI amplification, in order to mcrease 
the yield of final products, while mamtaing high specificity; (iv) the GLGI amphfed 
DNAs were directly precipttated and cloned into vector without gel purificatton, to 
prevent the loss of amplified products. This is contemplated be particularly important for 
products with short srzes and for products generated from templates with low copte. 
The inventors data showed that these changes significantly increase efficiency and 
specificity for GLGI amplification of 3' cDNAs, especially for templates expressed at 
low level. 

The SAGE tags selected for the analysis herein include SAGE tags with single 
match SAGE tags with multiple matches and SAGE tags without matches. FIG. 7 shows 
an example of the PGR amplifiers. Table 5 summarizes these results. Nineteen out 
of 20 smgle-matched SAGE tag in the control reactions were converted mto smgle 3' 
cDNA sequences and matched to the original matched single UniGene clusters. Seventy 
nine out of 89 unmar ch^T^el-SAGE-tags-were-converted-intoJong er 3' cDNA 
sequences proved by the presence of 3' po.y dA/dT tail, no CATG site withir . the 
sequences, and no matches to town sequences. One hundred and eighty ou, of 203 of 
GLGI reactions from multiple matched SAGE tags generated 3' sequences, most of 
which <>90%), matched to a single UniGene cluster among the original multtple matched 
UniGene clusters. The efficiency for detection is parallel with the abundance of the 
SAGE tags For higher abundant templates, the rate of success was nearly 100 percent. 
For the templates with low copies, the efficiency of detection was lower than that for htgh 
abundant SAGE tags. The inventors contemplate that this effect can be caused by low 
levels of template which reaches the limitation of the amplification. 
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Table5. Summary of GLGI results 



Number of 
Copy 



SAGE tags 



Over 50 



6 

150 
3 



Number of matched 
UniGene clusters 



9 

37 
74 



Single match 
Multiple match 
No match 



GLGI identiliea 
genes 



Single match 
Multiple match 
No match 



6 

136 
3 



5 

16 
12 



Single match 
Multiple match 
No match 



9 

34 
68 



4 
10 

8 



Total 



312 



278 



nus , he high . throug h>rGL TO ^ for '" ge - 

sca ,c gene ideation based on SAGE Tag se q uence, By using .his procedure^ 
Ldl of in— SAGE tags can ,o sm—y convened mto .hen 1 «DNA 
_ A iarge number of genes from genomes are expressed a, tow .eve, and fnese 
«d genes can only be de.ee.ed by SAGE .echniaue. Tbe combinauon of to 
expressed gen f<;irT: ta « detected from low copy templa.es prov.des 

GLGI procedure with large sets of SAGE tags detectea iro v, 
OLUiproce „ roce dure will accelerate the 

„ efficient way to identify these genes. Thus, to proc 

complehon of ,den„ fl ca«io„ of expressed genes in ,he human genome as we,, as ,n other 

eukaryotic genomes. 

A „ of the composes and/or methods dtsclosed and c.atmed herein can be made 
a „d executed without undue experimental ,n Ugh. of the present diseiosure. Wh> e 

m^ons and methods of .h,s mvention have been described in terms o £ « 
cmb Lments, w,U be apparent .o those of *U. in .he a* ,ha, var.at.o»s m W£ 
, 0 the compos,,,o„s and/or methods and ,n the step, or in the seauence of s^ * 
me , ho d described herein w,.hout departing from the concep, sprn. and sc pe of 
m en„o. More specficaily, i. wi„ be apparent that certain agents whrch are both 
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* 

chemica,,, and ph y s,o,o g icall y related may be substituted for the a g en,s 

while ,he same or s,m,.ar resuHs would be achieved. AH such smular subs, tutes and 

, ,„ ,hose skilled in the art are deemed to be within the spmt, scope 
modifications apparent to those sKiuea 

and concept of the invention as defined by the appended claims. 



1650516.1 



# 



REFERENCES 



The following references, .0 the extent that they provide exemplary procedural or 
other detatls supplementary to those set forth herem, are specifically incorporated neretn 

by reference. 

Beaucage and Lyer, Tetrahedron, 48:2223-2311, 1992. 
Bebenek and Kunkel, Nucl. Acids Res., 17:5408, 1989. 
Belkin and Jannasch, Arch. Microbiol, 141:181-186, 1985. 
Butler and Chamberlin, J. Biol. Chem., 257:5772-5778, 1982. 

Carr, Steven A., et al, "Mass Spectrometry in Bio-Pharmaceu.cal Research m Macro 
Molecular Seauencin, and Synthesis Selected Methods and Applications , PubL 
Alan R. Liss, Inc., pages 83-99, 1988 

Chen J., Rowley J. D., Wang S. M., Proc. Natl. Acad. Sci. USA., 97, 349-353, 2000. 



DalFiF¥/7P?M^rl-3--3i-40H-98-5, 

D'Alessio and Gerard, Nucl. Adds Re,, 16:1999-2014, 1988. 
Davanloo et al, Proc. Natl Acad. Sci. USA, 81:2035-2039, 1984. 
Derbyshire et al, Science, 240:199-201, 1988. 
Eckert and Kunkel, PCR Methods and Applications, 1:17-24, 1991. 
0 En g \vetal.,J.Biol. Chem., 258:11165-11173, 1983. 
Gillam et al, J. Biol. Chem. 253, 2532, 1978. 
Gillam et al, Nucleic Acids Res. 6, 2973, 1979. 
Green al, Cell, 32:681-694, 1983. 
Gubler and Hoffmann, Gene, 25:263-269, 1983. 
15 Gubler, Methods Enzymol, 152:330-335, 1987. 
Hashimoto, et al, Wood, 94:845-52, 1999. 
Hibi, et al, Cancer Res., 58:5690-5694, 1998. 
Hon etal,J. Biol ^.,254:11598-11604, 1979. 
Hants etal, J. Virol, 29:517-522, 1979. 
30 http://www.sagenet.org/sage_protocol.htm 

Hugh and Griffin, PCR Technology, 228-229, 1994. 



1650516.1 



-55- 



liyy et al, Biotechnique 11:464, 1991. 
Itakura and Rigg3.Sc*** 209:1401-1405, 1980. 
Itakuratf al., J. Biol. am. 250, 4592 1975 
^,etal,NucleicAcidsRes.,\9:\l\5,m\. 

Khorana, Science 203, 614 1979 

Krnangkum, et al, Nucleus Res., 20:3793-3794, 1992. 
foieg and Melton, Nucl. Aculs Re,, 12:7057-7070, 1984. 

l^l-^^^ 1 ^^ pp 16 ,8, A— San 
Lehman, In: The Enzymes, Boyer (Ed.), Vol. 14A, pp 

Diego, CA, 1981. 
Liang and Pardee, Science, 257:967-970, 1992. 
Liang et al. Nucleic Acids Res. 22:5763-5764, 1994. 
Liang, etaL, Nucleic Adds Res., 22:5763-5764, 1994. 
Lundberg,^/-, Gene,108:l-6, 1991. 



0 



Madden,^/., Oncogene, 15:1079-1085, 1997. 
Maniatiscf al, Cell, 8:163, 1976. 
Martial. AM*. 19:4967-4973, 1991. 

McClary c^/-, J- DAM Seguenong Af^ing, 1(3): 173-180, • 
Mead al., BioTechniques, 11(1): 76-87, 1991. 
Memkoth and Wahl, Methods Enzymol, 152:91-94, 1987. 
Melton, Proc. Nat'l Acad. Sc, USA, 82:144-148, 1985. 
Murray and Kelley, Molec. Gen. Genet, 175:77-87, 1979. 

. i I Rinl Chem 256:3112-3117,1981. 
Nordstrom et al.,J. Biol. L,nem., ^ 

Vow, Nucl. Acids Res., »M-M.™0. 

Perler e, a, Proc Na, 7 Acad Sci. V S A. 89(12):5577-81, 1992. 

* ~ , ----- — -~ 

Spring Harbor Laboratory, Cold Spring Harbor, NY, 1989. 
Sanger «al, Proc. ml***. ScL USA. 
Schenborn ana Merrendorf, Nucl Ada, Re,. 13:6223-6236, 1985. 

SvM« «al.. Methods E^no'-.™^* 9 - 1990 - 



1650516.1 



-56- 




10 



SI 



Tabor and Struhl, /, Current locals in Molecular Bio^, Ausube, eta,. (Eds.), 

John Wiley and Sons, NY, pp 3.5.10-3.5.12, 1989. 
Tanese and Ooff, Proc. tot Ac* ScL USA, SM977, 1988. 
U.S. Patent 4,704,362 
U.S. Patent 5,221,619 
U. S. Patent 5,583,013 
U.S. Patent 5,968,743 
U.S. Patent 4,659,774 
U.S. Patent 4,683,195, 
U.S. Patent 4,683,202 
U.S. Patent 4,800,1 59, 
U.S. Patent 4,816,571 
U.S. Patent 4,883,750 
U.S. Patent 4,959,463 



7*5 U.S. Patent 5,1 4 1,8T3 
U.S. Patent 5,207,880 
U.S. Patent 5,262,311 
U.S. Patent 5,264,566 
U.S. Patent 5,308,751 
20 U.S. Patent 5,360,523 
U.S. Patent 5,405,746 
U.S. Patent 5,428,148 
U.S. Patent 5,432,065 
U.S. Patent 5,455,008 
25 U.S. Patent 5,523,206 
U.S. Patent 5,554,744 
U.S. Patent 5,574,146 
U.S. Patent 5,602,244 
U.S. Patent 5,608,063 
30 U.S. Patent 5,639,608 
U.S. Patent 5,665,547 



1650516.1 



-57- 



10 



.ST 

M 



"H 



U.S. Patent 5,674,716 
U.S. Patent 5,755,943 
U.S. Patent 5,780,232 
U.S. Patent 5,817,797 

U.S. Patent 5,821,058 

U.S. Patent 5,821,060 

U.S. Patent 5,846,727 

U.S. Patent 5,858,671 

U.S. Patent 5,866,330 

U.S. Patent 5,985,556 

U.S. Patent 6,004,446 

U.S. Patent 5,866,328 
U.S. Patent 5,876,934 

Van den Berg, et al., Nucleic Acids Res., 27:el7, 1999. 



4? 15 

"ft 



S 20 

: u 



Velculescu, et al ~ CellT^^-X-l^, 

Velculescu, « a/., 5dence, 270:484-487, 1995. 
Velculescu, et al, Nat Genet. 23:387-8, 1999. 

Wang and Rowley, /Voc. Natl Acad. Set. ^,95:11909-11914, 1998. 

Wang, S. M., Fears, S. C, L. Zhang, J. J- Chen, J. D. Rowley, Proc Natl Acad Sa U S A. 

97, 4162, 2000. 
WO 90/07641, filed December 21, 1990. 
Zhang, et al, Science, 276:1268-1272, 1997. 
Zmnetal.,Cell, 34:865-879,1983. 



1650516.1 



-58- 



